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objects in indoor environments (e.g. [21,56]). 

In this work, we seek to substantially expand the cover- 
age of synthetic data, particularly objects and scenes from 
the natural world. We introduce Infinigen, a procedural 
generator of photorealistic 3D scenes of the natural world. 
Compared to existing sources of synthetic data, Infinigen is 
unique due to the combination of the following properties: 


e Procedural: Infinigen is not a finite collection of 3D 
assets or synthetic images; instead, it is a generator 
that can create infinitely many distinct shapes, textures, 
materials, and scene compositions. Every asset, from 
shape to texture, is entirely procedural, generated from 
scratch via randomized mathematical rules that allow 
infinite variation and composition. This sets it apart 
from datasets or dataset generators that rely on external 
assets. 


Diverse: Infinigen offers a broad coverage of objects 
and scenes in the natural world, including plants, 
animals, terrains, and natural phenomena such as fire, 
cloud, rain, and snow. 


Photorealistic: Infinigen creates highly photorealistic 
3D scenes. It achieves high photorealism by procedu- 
rally generating not only coarse structures but also fine 
details in geometry and texture. 


٠ Real geometry: unlike in video game assets, which 
often use texture maps to fake geometrical details (e.g. 
a surface appears rugged but is in fact flat), all geomet- 
ric details in Infinigen are real. This ensures accurate 
geometric ground truth for 3D reconstruction tasks. 


٠ Free and open-source: Infinigen builds on top of 
Blender [11], a free and open-source graphics tool. 
Infinigen’s code is released for free under the GPL 
license, same as Blender. Anyone can freely use 
Infinigen to obtain unlimited assets and renders *. 


*The output of GPL code is generally not covered by GPL. 
See www. gnu. org / licenses / gpl - faq. en. html # 
WhatCaselIsOutputGPL 


Abstract 


We introduce Infinigen, a procedural generator of photo- 
realistic 3D scenes of the natural world. Infinigen is entirely 
procedural: every asset, from shape to texture, is generated 
from scratch via randomized mathematical rules, using no 
external source and allowing infinite variation and composi- 
tion. Infinigen offers broad coverage of objects and scenes 
in the natural world including plants, animals, terrains, and 
natural phenomena such as fire, cloud, rain, and snow. In- 
finigen can be used to generate unlimited, diverse training 
data for a wide range of computer vision tasks including 
object detection, semantic segmentation, optical flow, and 
3D reconstruction. We expect Infinigen to be a useful re- 
source for computer vision research and beyond. Please visit 
infinigen.org for videos, code and pre-generated data. 


1. Introduction 


Data, especially large-scale labeled data [13,53], has 
been a critical driver of progress in computer vision. At the 
same time, data has also been a major challenge, as many 
important vision tasks remain starved of high-quality data. 
This is especially true for 3D vision, where accurate 3D 
ground truth is difficult to acquire for real images. 

Synthetic data from computer graphics is a promising so- 
lution to this data challenge. Synthetic data can be generated 
in unlimited quantity with high-quality labels. Synthetic data 
has been used in a wide range of tasks [4, 12,46,48,55,58,71], 
with notable successes in 3D vision, where models trained 
on synthetic data can perform well on real images zero- 
shot [28, 54, 82-85, 89]. 

Despite its great promise, the use of synthetic data in 
computer vision remains much less common than real 
images. We hypothesize that a key reason is the limited 
diversity of 3D assets: for synthetic data to be maximally 
useful, it needs to capture the diversity and complexity of the 
real world, but existing freely available synthetic datasets 
are mostly restricted to a fairly narrow set of objects and 
shapes, often driving scenes (e.g. [32, 71]) or human-made 


*work done while a student at Princeton University 


2306.09310v1 [cs.CV] 15 Jun 2023 


e 
° 


arXiv 


Figure 1. Randomly generated, non cherry-picked images produced by our system. From top left to bottom right: Forest, River, Underwater, 
Caves, Coast, Desert, Mountain and Plains. See Appendix for more samples. 


Blender node graphs (intuitive visual representation of pro- 
cedural rules often used by Blender artists) to Python code. 
We also develop utilities to render synthetic images and ex- 
tract common ground truth labels including depth, occlusion 
boundaries, surface normals, optical flow, object category, 
bounding boxes, and instance segmentation. Constructing 
Infinigen involves substantial software engineering: the lat- 
est main branch of the Infinigen codebase consists of 40,485 
lines of code. 

In this paper, we provide a detailed description of our 
procedural system. We also perform experiments to validate 
the quality of the generated synthetic data; our experiments 
suggest that data from Infinigen is indeed useful, especially 
for bridging gaps in the coverage of natural objects. Finally, 
we provide an analysis on computational costs including a 
detailed profiling of the generation pipeline. 

We expect Infinigen to be a useful resource for computer 
vision research and beyond. In future work, we intend to 
make Infinigen a living project that, through open-source 
collaboration with the whole community, will expand to 
cover virtually everything in the visual world. 


2. Related Work 


Synthetic data from computer graphics have been used in 
computer vision for a wide range of tasks [56,71]. We refer 
the reader to [65] for a comprehensive survey. Below we 
categorize existing work in terms of application domain, gen- 
eration method, and accessibility. Tab. 1 provides detailed 
comparisons. 

Application Domain. Synthetic datasets or dataset genera- 
tors have been developed to cover a variety of domains. The 


Infinigen focuses on the natural world for two reasons. 
First, accurate perception of natural objects is demanded 
by many applications, including geological survey, drone 
navigation, ecological monitoring, rescue robots, agriculture 
automation, but existing synthetic datasets have limited cov- 
erage of the natural world. Second, we hypothesize that the 
natural world alone can be sufficient for pretraining powerful 
“foundation models”—the human visual system was evolved 
entirely in the natural world; exposure to human-made ob- 
jects was likely unnecessary. 


Infinigen is useful in many ways. It can serve as a genera- 
tor of unlimited training data for a wide range of computer 
vision tasks, including object detection, semantic segmen- 
tation, pose estimation, 3D reconstruction, view synthesis, 
and video generation. Because users have access to all the 
procedural rules and parameters underlying each 3D scene, 
Infinigen can be easily customized to generate a large variety 
of task-specific ground truth. Infinigen can also serve as a 
generator of 3D assets, which can be used to build simulated 
environments for training physical robots as well as virtual 
embodied agents. The same 3D assets are also useful for 3D 
printing, game development, virtual reality, film production, 
and content creation in general. 


We construct Infinigen on top of Blender [11], a graphics 
system that provides many useful primitives for procedural 
generation. Utilizing these primitives we design and imple- 
ment a library of procedural rules to cover a wide range of 
natural objects and scenes. In addition, we develop utili- 
ties that facilitate creation of procedural rules and enable 
all Blender users including non-programmers to contribute; 
the utilities include a transpiler that automatically converts 


Figure 2. For each image (a), we have a high-res mesh (b), which readily yields Depth (c), Surface Normals (d), Occlusion Boundaries (e), 
Instance Segmentation masks (f), and 2D / 3D bounding boxes (g/h). From rendering metadata, we obtain Optical Flow (i), and material 
parameters such as Albedo (j), Lighting Intensity (k) and Specular Reflection (1). 


Procedural Procedural Provides 

Arrangement Assets Procedural Code ee Soiree 
No No N/A Grand Theft Auto 
No No N/A Grand Theft Auto 
No No N/A Grand Theft Auto 
No No N/A Adobe Mixamo [34] 
No No N/A Professional Designers 
No No N/A UE4Arch, UnrealEngine Marketplace [36] 
No No N/A PlannerSD [1] 
No No N/A UnrealEngine Marketplace [36] 
No No N/A Evermotion Architectures [17] 
No No N/A Scan2CAD [3], ShapeNet [10], Adobe Stock [35] 
No No N/A Blender Foundation [11] 
No No N/A Blender Foundation [11] 
No No N/A Professional Designers 
No No N/A ShapeNet [10], SceneNet [26] 
No No N/A 3D-FUTURE [20] 
Yes No No ShapeNet [10], Planner 5D [1] 
Yes No No Manufacturers / Kujiale [44] 
N/A Partial No Artist-Created Faces (textures, hair, clothing) 
Yes Partial No - 
Yes Partial No 7D-Labs [45] 
Yes Partial No CityEngine, OpenStreetMap [25], Manual Annotation 
Yes No Yes AI2-THOR [43], Professional Designers 
Yes No Yes ShapeNet [10], Google Scanned Objects [16] 
Yes Yes Yes None 


Free 


# Assets 


# Scenes 


# Triangles 


Synthetic Dataset Dorain Per-Scene in Total | in Total Assets 
GTA-V [71] Driving, Urban - - No 
MOTSynth [18] Urban - - No 
MVS-Synth [32] Driving, Urban - - - No 
DeformingThings4D [50] Animals/Humanoids 3 2K 2K Yes 
DeepFurniture [56] Indoor - 20K - No 
Robotrix [21] Indoor - 16 - No 
SUNCG [76] (+ [51,75,95]) Indoor - 46K 2.6K No 
TartanAir [90] In/Outdoor, Natural/Urban - 30 - No 
Hypersim [72] Indoor 100K-11M 461 59K No ($6000) 
OpenRooms [52] Indoor 1M 1.3K 3K No ($500) 
Sintel [9] Medieval, Natural 300K 27 - No ($12) 
Spring [63] Natural - 47 - No ($12) 
Structured3D [96] Indoor - 22K 472K No 
SceneNet-RGBD [61] Indoor 420K 57 5.1K Yes 
3D-Front [19] Indoor 60K 19K 13K Yes 
Jiang et al. [39] Indoor 2 oo 54K No 
InteriorNet [49] Indoor - 22M 1M No 
FaceSynthetics [91] Faces 7.4K - o0 No 
Meta-Sim2 [14] Driving, Urban - oo oo No 
Synscapes [87,92] Driving, Urban 25K 3 No 
ProcSy [41] Driving, Urban 2 oo oo Yes 
ProcTHOR [12] Indoor - oo 1.6K Yes 
Kubric [24] Scattered Objects 161K oo 52K Yes 
Infinigen (Ours) Natural a gneiss oo oo Yes 


Table 1. Comparison to existing synthetic datasets or generators. Ours is entirely procedural, relying on no external assets, and can produce 
infinite original assets and scenes. Many existing datasets use external, static asset libraries. Procedural generation is often limited to object 
placement or a subset of objects. The vast majority of datasets are also restricted to the built environment, especially indoor scenes. In 
terms of accessibility, many do not provide free assets or make code available. Many works do not report average triangles per scene; where 
possible, we calculate this using generous assumptions from the numbers they do report. Dashes represent numbers we were not able to 
obtain or estimate. In counting the number of assets, we exclude trivial modifications like re-lighting and re-scaling. 


TartanAir [90] and Sintel [9], include a mix of built and 
natural environments. There also exist datasets such as 
FlyingThings [60], FallingThings [86] and Kubric [24] 
that do not render realistic scenes and instead scatter 
(mostly artificial) objects against simple backgrounds. 
Synthetic humans are another important application domain, 
where high-quality synthetic data have been generated for 
understanding faces [91], pose [2,88], and activity [40,77]. 


built environment has been covered by the largest amount 
of existing work [9, 18, 24, 48, 50, 60, 86, 90] especially 
indoor scenes [12, 21,39, 47,51, 52,56, 72, 75, 76, 80,95, 96] 
and urban scenes [14, 18,32,41,71,87,92]. A significant 
source of synthetic data for the built environment comes 
from simulated platforms for embodied AI, such as AI2- 
THOR [43], Habitat [81], BEHAVIOR [78], SAPIEN [93], 
RLBench [37], CARLA [15]. Some datasets, such 5 


Asset Type Num. Generators Interpretable DOF 
Terrain 26 17 
Materials 50 271 
Weather, Fluid 19 61 
Rocks 4 12 
Small Plants 30 258 
Trees 3 26 
Creatures 39 315 
Scattering 11 110 
Total 182 1070 


Table 2. Approximate degrees of freedom, as a proxy of overall 
diversity. We count only distinct human-understandable parameters 
with useful ranges of interpolation, with the caveat that this could 
be an overestimate as not all parameters are fully independent. 
Some asset classes (e.g terrain) are based on physics simulation 
and have many more internal degrees of freedom not counted here. 
See Appendix D for a full list of named parameters. 


procedural, from shape to texture, from macro structures to 
micro details, without relying on any external asset. 
Accessibility. A synthetic dataset or generator is most 
useful if it is maximally accessible, i.e. it provides free 
access to assets and code with minimum use restrictions. 
However, few existing works are maximally accessible. 
Often the rendered images are provided, but underlying 
3D assets are unavailable, not free, or have significant use 
restrictions. Moreover, the code for procedural generation, 
if any, is often unavailable. 

Infinigen is maximally accessible. Its code is available 
under the GPL license. Anyone can freely use Infinigen to 
generate unlimited assets. 


3. Method 


Procedural Generation. Procedural generation refers to the 
creation of data through generalized rules and simulators. 
Where an artist might manually create the structure of a sin- 
gle tree by eye, a procedural system creates infinite trees by 
coding their structure and growth in generality. Developing 
procedural rules is a form of world modeling using compact 
mathematical language. 

Blender Preliminaries. We develop procedural rules pri- 
marily using Blender, an open-source 3D modelling software 
that provides various primitives and utilities. Blender repre- 
sents scenes as a hierarchy of posed objects. Users modify 
this representation by transforming objects, adding primi- 
tives, and editing meshes. Blender provides import/export 
for most common 3D file-formats. Finally, all operations 
in Blender can be automated using its Python API, or by 
inspecting its open-source code. 

For more complex operations, Blender provides an intu- 
itive node-graph interface. Rather than directly edit shader 
code to define materials, artists edit Shader Nodes to com- 
pose primitives into a photo-realistic material. Similarly, 
Geometry Nodes define a mesh using nodes representing 


Figure 3. Our Node Transpiler converts artist-friendly Node-Graphs 
(left) to procedural code (middle) which produces assets (right). 


Real Geometry 


Figure 4. Real-time optimized assets (left) often use low res. ge- 
ometry in conjunction with shading tricks and alpha-masked image 
textures to give the illusion of geometric detail. Infinigen assets 
(right) instead model objects in full geometric detail. Bottom left 
triangles show a random color per mesh face. 


Figure 5. Examples of a subset of our material generators. Columns 
1-4 are for terrain, 5-7 are for creatures, and 8 is miscellaneous. 


Some datasets focus on objects, not full scenes, to serve 
object-centric tasks such as non-rigid reconstruction [50], 
view synthesis [64], and 6D pose [31]. 

We focus on natural objects and natural scenes, which 
have had limited coverage in existing work. Even though 
natural objects do occur in many existing datasets such as 
urban driving, they are mostly on the periphery and have 
limited diversity. 

Generation Method. Most synthetic datasets are con- 
structed by using a static library of 3D assets, either ex- 
ternally sourced or made in house. The downside of a static 
library is that the synthetic data would be easier to overfit. 
Procedural generation has been involved in some existing 
datasets or generators [12, 24,29, 39,49], but is limited in 
scope. Procedural generation is only applied to either object 
arrangement or a subset of objects, e.g. only buildings and 
roads but not cars [41,92]. In contrast, Infinigen is entirely 
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Figure 6. Random, non cherry-picked terrain-only scenes. We sample 13 images for various natural scene types. From top left to bottom 
right; Mountains, Rainy river, Snowy mountains, Coastal sunrise, Underwater, Arctic icebergs, Desert, Caves, Canyons and Floating islands. 


peal 


Figure 8. Random, non cherry-picked leaves, flowers, mushrooms 
and pinecones. 


us to randomize graph structure not just input parameters. 
This tool makes node-graphs more expressive and allows 
easy integration with other procedural rules developed di- 
rectly in Python or C++. It also allows non-programmers to 
contribute Python code to Infinigen by making node-graphs. 
See Appendix E for more details. 


Generator Subsystems. Infinigen is organized into gener- 
ators, which are probabilistic programs each specialized to 
produce one subclass of assets (e.g. mountains or fish). Each 
has a set of high-level parameters (e.g. the overall height of 
a mountain), which reflect the external degrees of freedom 
controllable by the user. By default, we randomly sample 
these parameters according to distributions tuned to mirror 
the natural world, with no input from the user. However, 


See Appendix for more samples. 


Figure 7. Random, non cherry-picked images of simulated fire, 
smoke, waterfalls, and volcano eruptions. 


operators such as Poisson disk sampling, mesh boolean, ex- 
trusion etc. A finalized Geometry Node Tree is a generalized 
parametric CAD model, which produces a unique 3D object 
for each combination of its input parameters. These tools 
are intuitive and widely adopted by 3D artists. 

Although we use Blender heavily, not all of our procedu- 
ral modeling is done using node-graphs; a significant portion 
of our procedural generation is done outside Blender and 
only loosely interacts with Blender. 

Node Transpiler. As part of Infinigen, we develop a suite 
of new tools to speed up our procedural modeling. A no- 
table example is our Node Transpiler, which automates the 
process of converting node-graphs to Python code, as shown 
in Fig. 3. The resulting code is more general, and allows 
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Figure 9. Random, non cherry-picked procedural trees (left), cacti (top right) and bushes (bottom right). 


independent. Note that it is hard to quantify the internal de- 
grees of freedom, so the external degrees of freedom serve as 
a lower bound of the total degrees of freedom for our system. 
Material Generators. We provide 50 procedural material 
generators (Fig. 5). Each is composed of a randomized 
shader, specifying color and reflectance, and a local 
geometry generator, which generates corresponding fine 
geometric details. 

The ability to produce accurate ground-truth geometry is 
a key feature of our system. This precludes the use of many 
common graphics techniques such as Bump Mapping and 
Phong Interpolation [7,70]. Both manipulate face normals 
to give the illusion of detailed geometric textures, but do so 
in a way that cannot be represented as a mesh. Similarly, 
artists often rely on image textures or alpha channel masking 
to give the illusion of high res. meshes where none exist. All 
such shortcuts are excluded from our system. See Fig. 4 for 
an illustrative example of this distinction. 
Terrain Generators. We generate terrain (Fig. 6) using 
SDF elements derived from fractal noise [68] and simulators 
[6,30,33,62,68]. We evaluate these to a mesh using marching 
cubes [57]. We generate boulders via repeated extrusion, and 
small stones using Blender’s built-in addon. We simulate 
dynamic fluids (Fig. 7) using FLIP [8], sun/sky light using 
the Nishita sky model [66], and weather with Blender’s 
particle system. 
Plants & Underwater Object Generators. We model tree 
growth with random walks and space colonization [73], re- 
sulting in a system with diverse coverage of various trees, 
bushes and even some cacti (Fig. 9). We provide generators 
for a variety of corals (Fig. 10) using Differential Growth 
[67], Laplacian Growth [42], and Reaction-Diffusion [23]. 
We produce Leaves (Fig. 8), Flowers [38], Seaweed, Kelp, 
Mollusks and Jellyfish using geometry node-graphs. 
Surface Scatter Generators. Some natural environments 
are characterized by a dense coverage of smaller objects. 
To this end, we provide several scatter generators, which 
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Figure 10. Random, non cherry-picked underwater objects. 


Figure 11. Random, non cherry-picked surface scatters. Dense 
coverage with procedural assets turns any surface into a convincing 
grassland, sea floor or forest floor environment. 


users can also override any parameter using our Python API 
to achieve fine grained control of data generation. 

Each probabilistic program involves many additional in- 
ternal, low-level degrees of freedom (e.g. the heights of every 
point on a mountain). Randomizing over both the internal 
and external degrees of freedom leads to a distribution of as- 
sets which we sample from for unlimited generation. Tab. 2 
summarizes the number of human-interpretable degrees of 
freedom in Infinigen, with the caveat that the numbers could 
be an over-estimation because not all parameters are fully 
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Figure 12. Creature Generation. Our system automatically generates genomes (a), parts (b), assembly (c), materials (d) and animation rigs 
(e). On the right, we show random, non cherry-picked samples from our Carnivore, Herbivore, Bird, Beetle, and Fish generators. 


Figure 13. Data Generation Pipeline. We procedurally compose a scene layout (a) with random camera poses. We generate all necessary 
assets (b, showing a color per mesh face), and apply procedural materials and displacement (c). Finally, we render a photo-real image (d). 


attachment. We provide generators for 5 classes of realistic 
creature genomes, shown in Fig. 12. We can also combine 
creature parts at random, or interpolate similar genomes. See 
Appendix G.6 for details. 


Each part generator is either a transpiled node-graph, 
or a non-uniform rational basis spline (NURBS). NURBS 
parameter-space is high-dimensional, so we randomize 
NURBS parameters under a factorization inspired by lofting, 
composed of deviations from a center curve. To tune the 
random distribution, we modelled 30 example heads and 
bodies, and ensured that our distribution supports them. 


Our system produces high-quality animation rigs, and 
optionally simulates realistic surface folding, sagging and 
motion of creature skin using cloth simulation. For hair, we 
use the transpiler to automate the process of grooming hairs, 
as usually performed by human character artists. 


Dynamic Resolution Scaling. With the camera location 
fixed, we evaluate our procedural assets at precisely the level 
of detail such that each face is < 1px in size when ren- 
dered. This process is visualized in Fig. 14. For most assets, 
this entails evaluating a parametric curve at the given pixel 
size, or using Blender’s built-in subdivision or re-meshing. 
For terrain, we perform Marching Cubes on SDF points in 
spherical coordinates. For densely scattered assets (incl. all 
assets in Fig. 11) we use instancing - that is, we generate 
a fixed number of assets of each type, and reuse them with 
random transforms within a scene. Even with this effort in 
optimization, the average complete scene has 16M polygons. 


Figure 14. Dynamic Resolution Scaling. We show close-up mesh 
visualizations (top) of the same content for three different camera 
distances. Despite differing mesh resolution, no changes are visible 
in the final images (bottom). 


combine one or more existing assets in a dense layer (Fig. 
11). In the forest floor example, we generate fallen tree logs 
by procedurally fracturing entire trees from our tree system. 
Due to space constraints, all specific implementation de- 
tails of the above are available in Appendix G. 
Creature Generators. The genome of each creature is 
represented as a tree data-structure (Fig. 12 a). This reflects 
the topology of real creatures, whose limbs don’t form closed 
loops. Nodes contain part parameters, and edges specify part 
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Figure 15. Resource requirements to create a pair of stereo 1080p images using Infinigen. 


Ours Liet al. [48] | Sceneflow [60] TartanAir [90] FallingThings [86] 


Figure 16. Qualitative results on the Plant and Australia Mid- 
dlebury [74] test images. RAFT-Stereo trained using Infinigen 
generalizes well to images with natural objects. 


4. Experiments 


To evaluate Infinigen, we produced 30K image pairs with 
ground truth for rectified stereo matching. We train RAFT- 
Stereo [54] on these images from scratch and compare results 
on the Middlebury validation (Tab. 3) and test sets (Fig. 16). 
See Appendix for qualitative results on in-the-wild natural 
photographs. 
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Training Dataset | Adirondack Jadeplant Motorcycle Piano Pipes Playroom Playtable Recycle Shelves Vintage | Avg 


FallingThings [86] 8.3 43.3 12.3 18.2 25.3 29.7 50.0 10.4 43.3 45.6 |28.6 
Sintel-Stereo [9] 35.7 62.9 31.1 24.1 31.9 41.7 60.1 30.8 55.8 76.1 0 
HR-VS [94] 43.5 43.2 17.0 29.6 32.1 34.6 68.4 24.7 57.4 349 385 
Li et al. [48] 239 80.2 40.7 32.0 40.3 49.1 67.5 36.6 51.7 42.3 146.4 
SceneFlow [60] 7.4 41.3 14.9 16.2 33.3 18.8 38.6 10.2 39.1 29.9 |25.0 
155 45.1 18.1 12.9 4 25.6 51.0 20.9 49.1 28.2 |29.5 
17.1 59.7 21.3 23.8 358 33.9 36.4 20.0 33.4 44.1 [32.5 
7.4 35.2 15.2 20.7 7 29.3 50.0 12.6 55.1 46.9 |29.7 


Tartan Air [90] 
InStereo2K [5] 
Ours (Infinigen 30K) 


Table 3. Bad 3.0 (%) | error on the Middlebury [74] validation 
set. Infinigen generalizes well to natural objects (e.g. Jadeplant). 
However, natural objects contain very few planar or textureless 
surfaces; models trained exclusively on natural objects generalize 
less well on Middlebury’s indoor scenes. 


Image Rendering & Ground Truth Extraction. We ren- 
der images using Cycles, Blender’s physically-based path 
tracing renderer. We provide code to extract ground truth for 
common tasks, visualized in Fig. 2. 

Cycles individually traces photons of light to accurately 
simulate diffuse and specular reflection, transparent refrac- 
tion and volumetric effects. We render at 1920 x 1080 res- 
olution using 10,000 random samples per-pixel, which is 
standard for blender artists and ensures almost no sampling 
noise in the final image. 

Prior datasets [9, 24, 27,29,48] rely on blender’s built-in 
render-passes to obtain dense ground truth. However, these 
rendering passes are a byproduct of the rendering pipeline 
and not intended for training neural networks. Specifically, 
they are often incorrect due to translucent surfaces, volumet- 
ric effects, motion blur, focus blur or sampling noise. See 
Appendix C.2 for examples of these issues. 

Instead, we provide OpenGL code to extract surface nor- 
mals, depth and occlusion boundaries from the mesh directly 
without relying on blender. This solution has many benefits 
in addition to its accuracy. Users can exclude objects not 
relevant to their task (e.g. water, clouds, or any other object) 
independently of whether they are rendered. Many annota- 
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Middlebury Dataset. We evaluate our trained model on 
the Middlebury Dataset [74], which is a standard evaluation 
benchmark for stereo matching. It consists of megapixel 
image-pairs of cluttered indoor spaces, 10 with public 
ground-truth and 10 without. This benchmark is challenging 
due to its abundance of objects, textureless surfaces, and 
thin structures. In Tab. B, we see that our Infinigen-trained 
model struggles on images with exclusively artificial objects 
but performs well on the only image with natural objects 
(Jadeplant). In Fig H, we qualitatively evaluate our model on 
Middlebury images without public ground-truth and observe 
that our model generalizes well to the natural scenes. 


Training Dataset Bad 3.0 (%) | | # Image Pairs 
InStereo2K [5] 25.282 2K 
FallingThings [86] 12.199 62K 
Sintel-Stereo [9] 10.253 2K 

HR-VS [94] 9.296 780 

Li et al. [48] 9.271 177K 
SceneFlow [60] 7.837 35K 
TartanAir [90] 6.504 296K 

Ours (Infinigen 30K) 5.527 31K 


Table A. Performance on 400 independent Infinigen evaluation 
scenes. No assets are shared between our Infinigen training and 
evaluation scenes. 


Synthetic Images of Natural Scenes. Although quantita- 
tive evaluation is not currently feasible on real-world natural 
scenes, it can be done using synthetic natural scenes from 
Infinigen, with the caveat that we rely on the assumption 
that performance on Infinigen images is a good proxy to real- 
world performance, as suggested by the qualitative results in 
Fig. G. In Tab. A, we evaluate our Infinigen-trained model 
on an independent set of 400 image pairs from Infinigen. 
No assets are shared between our training and evaluation 
sets. Tab. A also compares the model trained on Infinigen 
to models trained on other datasets. We see that the model 
trained on Infinigen data has a significant lower error than 
those trained on other datasets. These quantitative results 
suggest that the distribution of Infinigen images is signifi- 
cantly different from existing datasets and that Infinigen can 
serve as a useful supplement to existing datasets. 


C. Dataset Generation 
C.1. Image Rendering 


We render images using Cycles, Blender’s physically- 
based path tracing renderer. Cycles individually traces pho- 
tons of light to accurately simulate diffuse and specular 
reflection, transparent refraction and volumetric effects. We 
render at 1920 x 1080 resolution using 10, 000 random sam- 
ples per-pixel. 


Appendix 
A. Figures Extended 


First, we provide a large sample of random, non-cherry- 
picked RGB images from our dataset generator ( Figs. A, 
B, C, D). Both this sample and Fig. 1 in the main paper are 
randomly selected, however unlike in Fig. 1, here we do 
not group the images by scene type. We limit the sample to 
576 JPEG images of resolution 960x540 only due to space 
constraints - in practice we can generate infinitely many such 
images as 1080P PNGs, with a full suite of accompanying 
ground truth data. 

Second, we show an extended random sample of our 
terrain system (Fig. E F), grouped by scene type as in Fig. 6 
of the main paper. 

Please visit infinigen.org for videos, code, and extended 
hi-res. random samples. 


B. Experiments 


To validate the usefulness of the generated data, we pro- 

duce 30K image pairs for training rectified stereo matching. 
We train RAFT-Stereo [54] on these images from scratch 
and compare against the same architecture trained on other 
synthetic datasets. Models are trained for 200k steps using 
the same hyper-parameters from [54]. 
Real Images of Natural Scenes. Because images from 
Infinigen consist of entirely of natural scenes and are de- 
void of any human-made objects, we expect models trained 
on Infinigen data to perform better on images with natural 
environments and worse on images without (e.g. indoor 
scenes). However, quantitative evaluation on real-world nat- 
ural scenes is currently infeasible because there does not 
exist a real-world benchmark that evaluates depth estima- 
tion for natural scenes. Existing real-world benchmarks 
consist almost entirely of images of indoor environments 
dominated by artificial objects. In addition, it is challeng- 
ing to obtain 3D ground truth for real-world natural scenes, 
because real-world natural scenes are often highly complex 
and non-static (e.g. moving tree leaves and animals), making 
high-resolution laser-based 3D scanning impractical. 

Due to the difficulty of obtaining 3D ground truth for real 
images of natural scenes, we perform qualitative evaluation 
instead. We collected high-resolution rectified stereo im- 
ages of real-world natural scenes using the ZED 2 Stereo 
Camera [79] and visualize the predicted disparity maps from 
RAFT-Stereo [54] in Fig. G. Our results show that a model 
trained entirely on synthetic scenes from Infinigen can per- 
form well on real images of natural scenes zero-shot. The 
model trained on Infinigen data produces noticeably better 
results than models trained on existing datasets, suggesting 
that Infinigen is useful in that it provides training data for a 
domain that is poorly covered by existing datasets. 


Figure A. 576 randomly generated, non-cherry-picked images produced by our system (Part 1 of 4). Images are compressed due to space 
constraints - please see 


Figure 8. 576 randomly generated, non-cherry-picked images produced by our system (Part 2 of 4). Images are compressed due to space 
constraints - please see 


Figure C. 576 randomly generated, non-cherry-picked images produced by our system (Part 3 of 4). Images are compressed due to space 
constraints - please see 


Figure D. 576 randomly generated, non-cherry-picked images produced by our system (Part 4 of 4). Images are compressed due to space 
constraints - please see 


Pipes Playroom Playtable Recycle Shelves Vintage | Avg 
25.3 29.7 50.0 10.4 43.3 45.6 28.6 
31.9 41.7 60.1 30.8 55.8 76.1 45.0 
32.1 34.6 68.4 24.7 57.4 34.9 38.5 
40.3 49.1 67.5 36.6 51.7 42.3 46.4 
33.3 18.8 38.6 10.2 39.1 29.9 25.0 
28.4 25.6 51.0 20.9 49.1 28.2 29.5 
35.8 33.9 36.4 20.0 33.4 44.1 32.5 
24.7 29.3 50.0 12.6 55.1 46.9 29.7 


Training Dataset Adirondack Jadeplant Motorcycle Piano 
FallingThings [86] 8.3 43.3 12.3 18.2 
Sintel-Stereo [9] 35.7 62.9 31.1 24.1 
HR-VS [94] 43.5 43.2 17.0 29.6 
Li et al. [48] 23.9 80.2 40.7 32.0 
SceneFlow [60] 7.4 41.3 14.9 16.2 
TartanAir [90] 15.5 45.1 18.1 12.9 
InStereo2K [5] 17.1 59.7 21.3 23.8 
Ours (Infinigen 30K) 7.4 35.2 152 20.7 


Table B. Bad 3.0 (%) ل‎ error on the Middlebury [74] validation set. Infinigen helps models generalize to images with natural objects (e.g. 
Jadeplant). On the other hand, natural objects contain very few planar or texture-less surfaces; models trained exclusively on natural objects 


open-source, we anticipate that users will generate count- 
less task-specific ground truth not covered above via simple 
extensions to our codebase. 


C.3. Runtime 


We benchmark Infinigen on 2 Intel(R) Xeon(R) Silver 

4114 © 2.20GHz CPUs and 1 NVidia-GPU (one of GTX- 
1080, RTX-[2080, 6000, a6000] or a40) across 1000 indepen- 
dent trials. We show the distribution in Fig. L. The average 
wall time to produce a pair of 1080p images is 3.5 hours. 
About one hour of this uses a GPU, for rendering specifically. 
More CPUs per-image-pair will decrease the wall-time sig- 
nificantly as will faster CPUs. Our system also uses about 
24Gb of memory on average. 
Pre-generated Infinigen Data. To maximize the accessi- 
bility of our system we will provide a large number of pre- 
generated assets, videos, and images from Infinigen upon 
acceptance. 


D. Interpretable Degrees of Freedom 


We attempted to estimate the complexity of our procedu- 
ral system by counting the number of human-interpretable 
parameters, as shown in the per-category totals in Table 2 
of the main paper. Here, we provide a more granular break 
down of what named parameters contributed to these results. 


Counting Method We seek to provide a conservative es- 
timate of the expressive capacity of our system. We only 
count distinct human-understandable parameters. We also 
only include parameters that are useful, that is if it can be 
randomized within some neighborhood and produce notice- 
ably different but still photorealistic assets. Each row of 
Tabs. C—J gives the names of all Intepretable DOF that are 
relevant to some set of Generators. 

We exclude trivial transformations such as scaling, rotat- 
ing and translating an asset. We include absolute sizes such 
as ’Length’ or ’Radius’ as parameters only when their ratio 
to some other part of the scene is significant, such as the 
leg to body ratio of a creature, or the ratio of a sand dune’s 
height to width. 


can generalize less well on indoor datasets like Middlebury. 


C.2. Ground Truth Extraction 


We develop custom code for extracting ground-truth di- 
rectly from the geometry. Prior datasets [9,24,27,29,48] rely 
on blender’s built-in render-passes to obtain dense ground 
truth. However, these rendering passes are a byproduct of the 
rendering pipeline and not intended for training ML models. 
Specifically, they are incorrect for translucent surfaces, volu- 
metric effects, or when motion blur, focus blur or sampling 
noise are present. 

We contribute OpenGL code to extract surface normals, 
depth, segmentation masks, and occlusion boundaries from 
the mesh directly without relying on blender. 

Depth. We show several examples of our depth maps in 
Fig. I. In Fig. K, we visualize the alternative approach 
of naively producing depth using blender’s built-in render 
passes. 

Occlusion Boundaries. We compute occlusion boundaries 
using the mesh geometry. Blender does not natively produce 
occlusion boundaries, and we are not aware of any other 
synthetic dataset or generator which provides exact occlusion 
boundaries. 

Surface Normals. We compute surface normals by fitting a 
plane to the local depth map around each pixel. Sampling 
the geometry directly instead can lead to aliasing on high- 
frequency surfaces (e.g. grass). 

The size of the plane used to fit the local depth map 
is configurable, effectively changing the resolution of the 
surface normals. We can also configure our sampling 
operation to exclude values which cannot be reached from 
the center of each plane without crossing an occlusion 
boundary; planes with fewer than 3 samples are marked 
as invalid. We show these occlusion-augmented surface 
normals in Fig. I. These surface normals appear only 
surfaces with sparse occlusion boundaries, and exclude 
surfaces like grass, moss, lichen, etc. 

Segmentation Masks. We compute instance segmentation 
masks for all objects in the scene, shown in Fig. I. Object 
meta-data can be used to group certain objects together arbi- 
trarily (e.g. all grass gets the same label, a single tree gets 
one label, etc). 

Customizable. Since our system is controllable and fully 


to appropriate random number generators into the resulting 
code. 


F. Scene Composition Details 


Our scenes are not individually staged images - for 
each image, we produce a map of an expansive and view- 
consistent world. One can select any camera pose or se- 
quence of poses, which allows for video and other multi-view 
data generation. 

To allow this, we start by sampling a full-scene ground 
surface. This is low-resolution, but is sufficient to approx- 
imate the surface of the terrain for the purpose of placing 
objects. We determine surface points using Poisson-Disc 
Sampling. This avoids the majority of asset-asset intersec- 
tions. We modulate point density using procedural masks 
based on surface normals, Perlin noise and terrain attributes. 
Asset rotations are determined uniformly at random. Our fi- 
nal coarse global map is represented as a lightweight blender 
file with intuitive editable placeholders to represent where 
assets will be spawned in later steps of the pipeline. 

We provide a library of 11 optional configuration files to 
modify scene composition, namely Arctic, Coast, Canyon, 
Cave, Cliff, Desert, Forest, Mountain, Plains, River and 
Underwater. Each encodes simple natural priors such as 
*Cacti often grow in deserts” or ”Trees are less dense on 
mountains”, expressed as modifications to these mask and 
density parameters. More complex relations, like predators 
and prey avoiding each other, or plants not growing in shaded 
areas, are not currently captured. 


E.1. Camera Selection 


We select camera viewpoints with simple heuristics de- 
signed to match the perspective of a creature or person, 
which are as follow: 


Height above Ground In order to match the perspective 
of a creature or person, we sample the camera height above 
ground from a Gaussian distribution. (with the exception 
that in terrain-only scenes sometimes this height is higher to 
highlight some landscape features) 


Minimum Distance To avoid being blocked by a close-up 
object and over-subdividing the geometry (which is expen- 
sive), we select camera views with a minimum distance 
threshold to all objects. 


Coverage In order to avoid overly barren images and to 
highlight interesting features, we may select views such 
that a certain terrain component, e.g. a river, is visible. 
Specifically, we may require that the camera view has pixels 
from a specific terrain component or object type within a 
certain range. 


Many of our material generators involve randomly gener- 
ating colors using random HSV coordinates. This has three 
degrees of freedom, but out of caution we treat each color as 
one parameter. Usually, one or more HSV coordinates are 
restricted to a relatively narrow range, so this one parameter 
represents the value of the remaining axis. Equivalently, it 
can be imagined as a discrete parameter specifying some 
named color-palette to draw the color from. Some genera- 
tors also contain compact functions or parametric mapping 
curves, each usually with 3-5 control points. We treat each 
curve as one parameter, as the effect of adjusting any one 
handle is subtle. 


Results In total, we counted 182 procedural asset genera- 
tors with a sum of 1070 distinct interpretable parameters / 
Degrees-of-Freedom. We provide the full list of these named 
parameters as Tables C—J, placed at the end of this document 
as they fill several pages. 


E. Transpiler 


Code Generation In the simplest case, the transpiler is a 
recursive operation which performs a post-order traversal 
of Blender’s internal representation of a node-graph, and 
produces a python statement defining each as a function call 
of it’s children. We automatically handle and abstract away 
many edge cases present in the underlying node tree, such as 
enabled/disabled inputs, multi-input sockets and more. This 
procedure also supports all forms of blender nodes, including 
shader nodes, geometry nodes, light nodes and compositor 
nodes. 


Doubly-recursive parsing Blender’s node-graphs support 
many systems by which node-graphs can contain and de- 
pend on one another. Most often this is in the form of a 
node-group, such as the dark green boxes titled Sunflow- 
erSeedCenter and PhylloPoints in Fig. M. Node-groups are 
user-defined nodes, containing an independent node-tree as 
an implementation. These are equivalent to functions in 
a typical programming languages, so whenever one is en- 
countered, the transpiler will invoke itself on the node-graph 
implementation and package the result as a python func- 
tion, before calling that function in the parent node-graph. 
In a similar fashion, we often use SetMaterial nodes that 
reference a shader node-graph to be transpiled as a function. 


Probability Distribution Annotation Node-graphs con- 
tain many internal parameters, which are artistically tuned by 
the user to produce a desired result. We provide a minimal 
interface for users to also specify the distribution of these pa- 
rameters by writing small strings in the node name. See, for 
example, the red nodes in Fig. M. When annotations of this 
format are detected, the transpiler automatically inserts calls 


F.2.2 Parametric Surface Resolution Scaling 


NURBS and other parametric surfaces support evaluation at 
any mesh resolution. This is achieved by specifying some 
du, dv as step sizes in parameter space. We provide heuris- 
tics to compute appropriate values for these step sizes such 
that the resulting mesh achieves a given max pixel size. 


F.2.3 Subdivision and Remeshing 


All other assets rely on established Subdivision and Voxel- 
Remeshing algorithms to create dense pixel-size geometry. 
We provide heuristics to compute appropriate subdivision 
levels. Voxel Remeshing is time intensive but is especially 
useful for creatures and trees whose geometry can otherwise 
self-intersect or contain stretched faces. 


G. Asset Implementation Details 
G.1. Materials 


Our materials are composed of a shader and a local ge- 
ometry template. The shader procedurally generates realistic 
color, roughness, specularity, metallicity, subsurface scat- 
tering, and translucence parameters. The local geometry 
template generates corresponding geometric detail. Most 
often, the local geometry template simply computes a scalar 
field over the underlying mesh vertices, and displaces them 
along their normals to form a rough texture. 


G.1.1 Terrain Materials 


The majority of our terrain materials (including Mountain, 
Granite, Snow, Stone, Ice) operate by combining many oc- 
taves of Perlin Noise to form geometric texture, before ap- 
plying a mostly flat color. Some, such as some random 
variations of Mountain, create layered color masks as a func- 
tion of the world Z coordinate. Others such as Sand follow a 
similar scheme, but with a procedural Wave texture instead 
of Perlin Noise. 

Our Mud and Sandstone materials are particular expres- 
sive. Mud procedurally generates puddles and slick ground 
by altering color, displacement and roughness jointly. Sand- 
stone generates realistic layered sedimentary rock using 
noise functions and modular arithmetic based on the world 
Z coordinate. 

Lava uses displacement from Perlin noise, Fl-smooth 
Voronoi texture, and wave texture with varying scales. The 
shader uses Perlin noise, Voronoi texture to model hot and 
rocky parts, mixed with blackbody emission and a principled 
BSDF. 

Fire and Smoke are comprised of multiple volumetric 
shaders. The first shader uses a principled volume shader 
with blackbody radiation whose intensity is sampled based 
on the amount of flame and smoke density. The smoke 


Standard Deviation of Depth We compute the variance 
of pixel-wise depth values, and choose the viewpoint out of 
ten random samples with the largest variance to favor more 
interesting content. 


F.2. Dynamic Resolution 


In Fig. J, we show a visualization of triangle sizes in cm? 


and in pixels as viewed from the camera. Face size in meters 
increases proportionally to depth, whereas face size in pixels 
remains approximately constant. 


F.2.1 Spherical Marching Cubes 


To generate a mesh for a specific camera view, we must 
extract a mesh representation of the terrain which contains 
dense pixel-size faces. Classical Marching Cubes struggles 
to achieve this, as it evaluates the SDF at fixed intervals, 
which results in too-sparse geometry near the camera and 
too-dense geometry in the far distance. Spherical Marching 
Cubes is our novel adaptation of this classic algorithm to 
operate in spherical coordinates, which automatically creates 
dense geometry near the camera where it is most needed, 
thereby preventing waste and drastically improving perfor- 
mance. 
Spherical Marching Cube algorithm works as follows: 


Low Resolution Step We divide the visible 3D space 
(within the frustrum camera and a certain distance range 
(dmin, dmax)) into M x N x R blocks in spherical coor- 
dinates, uniform in 0 and ¢ and logarithmically in r. We 
convert these to cartesian coordinates and evaluate the SDF 
as usual. This yields a tensor of SDF values where grid cells 
near dmax represent larger regions of space than those close 
to the camera, thereby saving space. 


Visible Block Search Step We use this SDF tensor to find 
the closest block for each pixel with an SDF zero crossing, 
resulting in an approximate terrain-only depth-map. These 
blocks are low-resolution and may contain holes, so we 
check them again with dense SDF queries to determine any 
farther away visible blocks. 


High Resolution Step Finally, we evaluate dense pixel- 
size SDF queries for all visible blocks, and produce a final- 
ized mesh with Marching Cubes. Theoretically the final size 
of this mesh can still be proportional to cube of the resolu- 
tion, but in practice we find it is proportional to square. This 
step and the previous step can be performed iteratively to 
prevent all potential holes. 


Out-of-view Part The out-of-view part of the terrain is 
needed for lighting effects but done with low resolution. 


Carnivore and Herbivore Carnivores and Herbivores are 
also equipped with particle fur and therefore color-only mate- 
rials. We provide a Tiger material, which makes short stripes 
by cutting a small-scale wave texture with another larger- 
scale wave texture. Our Giraffe material builds spots by 
subtracting a Fl-smooth voronoi texture from a F1 voronoi 
texture with the same scale. Three other spot materials build 
scattered and sometimes overlapped spots using noise tex- 
tures and voronoi textures. Our three reptile materials build 
colormaps inspired by different kinds of reptiles. We pick 
their colors to mimic the reference reptile images, and then 
randomize in a small neighborhood. 


Beetle We provide a Chitin material emulating the mate- 
rial of real beetles and other insects. It uses a computed 


*Pointiness” attribute to highlight the boundaries with sharp 


curvature. We color the insect head and sharp boundaries 
black, and other areas dark red or brown. 


Bone, Beak & Horns Bone is most often used for animal 
claws and teeth. It starts wit a white to light gray color, 
before creating small pits in two different scales using noise 
and voronoi textures. Our Horn material samples light brown 
to dark brown colors, with a similar mechanism for pits and 
scratches. Our Beak material replicates realistic bird beaks 
by sampling a random yellow/orange/black color gradient 
along. Noise textures are used to add some small pits on it. 
Principled BSDF shaders are added to make the beaks more 
shiny. 


Slime We provide a material titled Slimy, which replicates 
folded shiny flesh similar to a ”Blobfish”. It builds thin and 
distorted stripe displacement using a noise texture followed 
by a wave texture. We create the shiny appearance by assign- 
ing high specularity and subsurface scattering parameters. 


G.1.4 Other material 


Metal We build a silver material and an aluminium ma- 
terial. These use Blender’s Principled BSDF shader with 
Metallic set to 1. They also have sparse sunken displacement 
from a noise texture. 


density is randomly sampled. The second shader imitates a 
high detail image captured with fast shutter speed and low 
exposure. The detail is brought out based on contrasting 
different regions of temperature and adding Perlin noise. 
The colors are shades of orange. 

All our water materials feature glass-like surfaces with 
physically accurate Index-of-Refraction, combined with a 
volumetric scattering shader to simulate realistic underwa- 
ter light bounces. The Water shader uses these effects with 
geometry untouched (for use with simulators), wherase Wa- 
ter Surface assumes the base geometry is a plane and adds 
geometric ripples. 


G.1.2 Plant Materials 


Our plant materials feature geometric and color variation 
using procedural Stucci, Marble and Shot Noise textures. 
We provide color pallettes for realistic plant and coral col- 
ors, and produce variations on these using Musgrave noise. 
All leafy plants feature realistic transmission and roughness 
properties, to simulate light filtering through translucent 
waxy leaves. Our Bark and Bark Birch simulate bark ex- 
pansion and fracturing using voronoi and perlin noise to 
generate displacements. 


G.1.3 Creature Material 


Fish We create two fish materials, a goldfish material and 
a regular fish material. Each material consists of two parts, 
a fish body material and a fish fin material. The fish body 
material creates the displacement of fish scales. To build 
the pattern of fish scales, we create a grid and fill every two 
adjacent grids in one column with a half circle. Then we 
move up the half circles in the even columns by one grid. 
The colors of regular fish are guided by one wave texture 
and two noise textures. The colors of goldfish are guided 
by two noise textures and sampled between red and yellow. 
The colors of fish bellies are usually white. A fish fin is 
created by adding periodical bumps to round planes. The 
weights of bump displacement are decreasing away from the 
fish body. Dorsal fins are sometimes serrated. The goldfish 
fins are translucent by mixing a principled BSDF shader, a 
transparent shader and a translucent shader. 


Bird Since our generated birds have particle fur, the bird 
material need only create a colormap. We create masks that 
highlight different body parts, including heads, necks, upper 
bodies, lower bodies and wings. Each part is assigned two 
similar colors guided by a noise texture. Our Bird material 
generator has two modes of colors, one with light colored 
heads and dark color bodies (emulating bald eagles and 
falcons), one with dark color heads and light color bodies 
(emulating ducks and common birds). 


various types of tiles can be used alone or in combination to 
generate infinite scenes including mountains, rivers, coastal 
beaches, volcanos, cliffs, and icebergs. Fig. N(c) shows an 
example of a iceberg tile. Tiles are combined by repeating 
them with random rotations and smoothing any boundaries. 
The resulting terrain element is still represented as an SDF. 


Caves Our terrain includes extensive cave systems. These 
are cut out from other SDF elements before the mesh is 
created. The cave passages are generated procedurally us- 
ing an Lindenmayer-System with probabilistic rules, where 
each rule controls the direction and movement of a virtual 
turtle [59]. These rules include turns, elevation changes, 
forks, and others. These passages have varying cross section 
shapes and are unioned with each other, leading to compli- 
cated cave systems with features from small gaps to large 
caverns. One can intuitively tune the cavern size, tunnel 
frequency and fork frequency by adjusting the likelihoods of 
various random rules. See Fig. N(d) for an interior view of a 
cave and Fig. N(e) for how it cut from mountains. 


Floating islands Besides natural scenes, we also have 
fantastical terrain elements, e.g., floating islands (Fig. N(f)) 
by gluing mountains and upside down mountains together. 


0.2.22 Boulders 


We start off with a mesh built from convex hull of around 
32 vertices. We randomly select some faces that are large 
enough, so that they can be extruded and scaled. We repeat 
this process for two levels of extrusions: large and small. 
Finally we bevel the mesh, and add a displacement based on 
high- and low-frequency Voronoi textures. After generating 
the base mesh, boulders are given a rock surface and option- 
ally a rock cover surface. See Sec. G.1 for details. Boulders 
are placed on the terrain mesh as placeholders. 


G.2.3 Fluid 


Most water and lava in our scenes form relatively static pools 
and lakes — these are handled by generating a flat plane and 
applying a Surface Water or Lava material from Sec. G.1. 


Ocean We simulate dyhnamic oceans Blender’s built-in 
modifier to generate displacements on top of a water plane. 
This simulation is finite domain, so we tile it as described in 
Tiled Landscape above. 


Dynamic Fluid Simulation We generate dynamic water 
and lava simulations using Blender’s built-in Fluid-Implicit- 
Particle (FLIP) plugin [8]. They can either simulated on 
a small region of the terrain or work together with Tiled 
Landscape, e.g., Volcanos to be reused as instances. The 


G.2. Terrain 


Figure N. Terrain elements, including a) wind-eroded rocks, b) 
Voronoi rocks, c) Tiled landscapes, d/e) Caves, and f) Floating 
Islands. 


0.2.1 Terrain Elements 


The main part of terrain (Ground Surface) is composed of 
a set of terrain elements represented by Signed Distance 
Functions (SDF). Using SDF has the following advantages: 


٠ SDF is written in C/C++ language and can be compiled 
either in CPU (with OpenMP speedup) or CUDA to 
achieve parallelization. 


٠ SDF can be evaluated at arbitrary precision and range, 
producing a mesh with arbitrary details and extent. 


٠ SDF is flexible for composition. Boolean operations are 
just minimum and maximum of SDF. To put cellular 
rocks onto a mountain, we just query whether each 
corresponding Voronoi center has positive SDF of the 
mountain. 


Terrain elements include: 


Wind Eroded Rocks They are made out of Perlin 
noise [69] from FastNoise Lite [68] with domain warping 
adapted from an article [22]. See Fig. N(a) for an example. 


Voronoi Rocks These are made from Cellular Noise 
(Voronoi Noise) from FastNoise Lite. They take another 
terrain element as input and generate cells that are on the 
surface of the given terrain element and add noisy gaps to the 
cells. See Fig. N(b) for an example. We utilize this element 
to replicate fragmented rubble, small rocks and even tiny 
sand grains. 


Tiled Landscape Complementary to above, we generate 
heterogeneous terrain elements as finite domain tiles. First, 
we generate a primitive tile using the A.N.T. Landscape 


Add-on in Blender [1 1], or a function from FastNoise Lite. 


This tile has a finite size, so we can simulate various natural 
process on it, e.g., erosion by SoilMachine [62], snowfall 
by diffusion algorithm in Landlab [33] [6] [30]. Finally, 


Leaves Our leaf generation system covers common leaf 
types including oval-shaped (which covers most of the broad- 
leaved trees), maple, ginkgo, and pine twigs. 

For oval-shaped leaves, we start by subdividing a 2D 
plane mesh finely into grids, and evaluate each grid location 
with various noise functions. The leaf boundary defined by a 
set of control points of a Blender curve node, which specifies 
the width of the leaf at each location along the main stem. 
We then delete the unused geometry to get a rough shape of 
the leaf. To create veins, we use a 1D Voronoi Texture node 
on a rotated coordinate system, to model the angle between 
the veins and the main stem. We extrude the veins and pass 
the height values into the Shader Node Tree to assign them 
different colors. We then add jigsaw-like patterns on the 
boundaries of the leaf, and create the cell structures using a 
2D Voronoi Texture node. We finally add wrapping effect to 
the leaf with another curve node. 

The maple leaves and ginkgo leaves are created in a very 
similar way as the oval-shaped leaves, except that we use 
polar coordinates to model rotational symmetric patterns, 
and the shape of the leaves is defined by a curve node in the 
polar coordinates. 

Pine twigs are created by placing pine needles on a main 
stem, whose curvature and length are randomized. 

We use a mixture of translucent, diffuse, glossy BSDF to 
represent the leaf materials. The base colors are randomly 
sampled in the HSV space, and the distribution is tuned for 
each season (e.g., more yellow and red colors for autumn). 


FlowerPlant We create the stem of a flower plant with 
a curve line together with an cylindrical geometry. The 
radius of the stem gradually shrinks from the bottom to the 
top. Leaves are randomly attached to resampled points on 
the curve line with random orientations and scales within 
a reasonable range. Each leaf is sampled from a pool of 
leaf-like meshes. Additionally, we add extra branches to 
the main stem to mimic the forked shape of flower plants. 
Flowers are attached at the top of the stem and the top of the 
branches. Additionally, the stem is randomly rotated w.r.t 
the top point along all axes to generate curly looks of natural 
flower plants. 


G.3.2 Trees & Bushes 


We create a tree with the following steps: 1) Skeleton Cre- 
ation 2) Skinning 3) Leaves Placement. 

Skeleton Creation. This step creates a directed tree-graph 
to represent the skeleton of a tree. Starting from a graph 
containing a single root node, we apply Recursive Paths to 
grow the tree. Specifically, in each growing step, a new node 
is added as the child of the current leaf node. We computed 
the growing direction of the new node as the weighted sum 
of the previous direction (representing the momentum) and 


simulations are parameterized by sampled values of vortic- 
ity, viscosity, surface tension, flow amount, and other liquid 
parameters. We simulate fire and smoke simulations us- 
ing Blender’s particle simulator. Our system allows for 1) 
Simulating fire and smoke on small random regions on the 
terrain or 2) Choosing arbitrary meshed assets on the scene 
to be set on fire or emit smoke. The simulations interact 
with turbulent and laminar wind flows added on the scene. 
While these simulators are provided with Blender, we con- 
tribute significant engineering effort to automate their use. 
Typically, users manually set up individual simulations and 
execute them through the UI - we do so programmatically 
and at large scale. 


G.2.4 Weather 


We provide procedural SDF functions for 4 realistic cate- 
gories of clouds, each implemented as node-graphs. We 
implement rain and snow using Blender’s particle system 
and wind simulation. We also apply atmospheric volume 
scattering to the entire scene to create haze, fog etc. 


G.2.5 Lighting 


The majority of scenes are lit only by the sun and sky. We 
simulate these using the built in Nishita [66] sky model, with 
randomized parameters for the Sun’s position and brightness, 
as well as atmospheric parameters. In cave scenes, we place 
glowing gemstones as natural proxies for point lights. In 
underwater scenes, ray-traced refractive caustics are too 
costly at render-time, so we substitute textured spot lights. 
Finally, we provide an option to attach a virtual flash light or 
area light to the camera, to simulate a human or robot with 
an attached light. 


G.3. Plants 
00.3.1 Leaves, Flowers & Pinecones 


Pinecones Pinecones are the woody seed-bearing organs 
for conifers, which features scales and bracts arranged 
around a central axis, as shown in Fig. U b). Pinecones 
are made from individual buds. Each buds are sculpted from 
a mesh circle, with its left most point chosen as the origin. 
The vertices are displaced along the axis to the origin as well 
as along the Z axis, with scale designated by the direction 
from the origin to that point. We then create a mesh line 
along the Z-axis to form the stem of the pinecone. Pinecone 
buds are distributed on the axis from bottom to up with a 
decreasing scale and changing rotation. The rotation is com- 
posed of two parts: one along the Z axis that spread the buds 
around the pinecone, another along the X axis the gradually 
point the buds upwards. Pine shaders are made from Princi- 
pled BSDF of a single color. Pinecones are scatters on the 
terrain mesh surfaces. 


shape and tentacles growing from the pointy vertices of the 
cactus body. We implement globular cactus by first creating 
a star-like 2D mesh as its cross section. We then use geome- 
try nodes to rotate, translate and scale it along the Z-axis at 
the same time, converting it to a 3D mesh. The rotation of 
the cross section mesh contributes to the desired tilt of the 
cactus body, and the scale determines the general shape of 
cactus. Finally the cactus is deformed and scale along the X 
and Y axis. For Globular Cactus, spikes are distributed on 
the the pointy vertices generated from the star and on the top 
most part of the cactus. 


Columnar Cactus is modeled after cactus from genus 
Cereus, as shown in Fig. Pb). It features an elongated body 
with a torch-like shape. We first generate the skeleton using 
our tree skeleton generation method. This time we choose 
a configuration with only two levels, both with a smaller 
momentum in path generation and a large drag towards the 
positive Z-axis that finally makes the cactus pointing up. 
From the cactus skeleton, we convert it into a 3D mesh with 
geometry nodes that moves a star mesh along all the splines 
in the tree-like skeleton, with the top end of each path having 
a smaller radius. For Columnar Cactus, spikes are distributed 
on the the pointy vertices generated from the star and on the 
top most part of the cactus. 


Pricky pear Cactus is modeled after cactus from genus 
Opuntia, as shown in Fig. Pc). It features pear-like cactus 
part extruded from another one. We create individual cactus 
parts similar to the one in Globular cactus. We it down in 
Y-direction and rotated it along the Z-axis so that it becomes 
almost planar and pear-shaped. These individual cactus parts 
are stacked on top of each other with an angle recursively 
to make the whole cactus branching like a tree. For Pricky 
Pear Cactus, we distribute spikes on both the front and the 
back faces of the cactus. 


Cactus spikes After the main body of a cactus is created, 
we apply medium-frequency displacement on its surface 
and add the spike according to specifications. The low-poly 
spikes are made from several straight skeletons generated by 
the tree generation system, and are distributed on the selected 
areas of the cactus with some minimal distance between two 
instances. 


G.3.4 Fern 


We create 2-pinnate fern (fern) in Infinigen, as it is common 
in nature. Each fern is composed of a random number of 
pinnae with random orientations above the ground. Several 
instances of fern pinnae are illustrated in Fig.Q. 


a random unit vector (representing the randomness). We 
further add an optional pulling direction as a bias to the 
growing direction, to model effects such as gravity. We also 
specify the nodes where the tree should be branching, where 
we add several child nodes and apply Recursive Paths for 
all of them. Finally, given the skeleton created by Recursive 
Paths, we use the space colonization algorithm [73] to create 
dense and natural-looking branches. We scatter attraction 
points uniformly in a cube around the generated skeleton, 
and run the space colonization for a fixed amount of steps. 
Skinning. We convert the skeleton into a Blender curve 
object, and put cylinders around the edges, whose radius 
grows exponentially from the leaf node to the root node. We 
then apply a procedural tree bark material to the surface of 
the cylinders. Instead of using a UV, we directly evaluate 
the values of the bark material in the 3D coordinate space to 
avoid seams. Since the bark patterns are usually anisotropic 
(e.g., strip-like patterns along the principal direction of the 
tree trunks), we use the local coordinate of the cylinders, up 
to some translation to avoid seams in the boundaries. 
Leaves Placement. We place leaves on twigs, and then 
twigs on trees. Twigs are created using the same skele- 
ton creation and skinning methods, with smaller radius and 
more branches. Leaves are placed on the leaf nodes of the 
twig skeleton, with random rotation and possibly multiple 
instances on the same node. We use the same strategy to 
place the twigs on the trees, again with multiple instances 
to make the leaves very dense. For each tree we create 5 
twig templates and reuse them all over the tree by doing 
instancing in Blender, to strike a balance between diversity 
and memory cost. 

Compared to existing tree generation systems such as 
the Sapling-Tree-Gen* addon in Blender and Speed-Tree’ in 
UE4, our tree generation system creates leaves and barks 
using real geometry with much denser polygons, and thus 
provides high-quality ground-truth labels for segmentation 
and depth estimation tasks. We find this generation proce- 
dural very general and flexible, whose parameter space can 
cover a large number of tree species in the real world. 
Bushes We also use this system to create other plants such as 
bushes, which have smaller heights and more branching com- 
pared to trees. Our system models the landscaping bushes 
that are pruned to different shapes by specifying the distribu- 
tions of the attraction points in the space colonization step. 
Our bushes can be of either cone, cube or ball shaped, as 
shown in Fig. O. 


G.3.3 Cactus 


Globular Cactus is modeled after cactus from genus Fe- 
rocactus, as shown in Fig. Pa). It features a barrel-like base 


*https://docs.blender.org/manual/en/latest/addons/add_curve/sapling.htm] 
*https://store.speedtree.com 


G.3.5 + 


Mushrooms are modeled after real mushrooms from genus 
Agaricus and Phallus, as shown in Fig. U a). A mushroom 
is composed of its cap, its stem that supports the cap and 
optionally the skirt that grows from underneath the cap. The 
cap is made from moving a star-like mesh along the Z-axis 
with radius specified by multiple Bezier curves. The star-like 
mesh forms the pleats on the cap surface and gills underneath 
the caps. The stem is made from a Bezier curve skeleton 
and converted to a 3D mesh via geometry nodes, with its 
top end sticking to the cap at an angle. For the mushroom 
skirt, we create an invisible mesh P underneath the cap 
that have a similar cross section as the cap. Following the 
same technique in the Brain Coral (See Appendix G.3.6), we 
shrinkwrap a reaction-diffusion pattern from a icosphere S' 
onto P, which also mapped the A field onto P. This time, 
we remove the vertices whose A field is below a certain 
threshold from the mesh P, so P would have honeycomb- 
like holes. The skirt is placed underneath the cap. Mushroom 
have similar material as corals, but have more white spots 
and lower roughness on its surface. Mushrooms are scattered 
on the terrain surface. 


G.3.6 Corals 


Corals are marine invertebrates that live mostly on the 
seafloor, and are prevalent in underwater scenes. In Infinigen, 
we provide a library of 8 templates for generating different 
classes of corals. Examples of individual coral classes are 
provided in Fig. R. We elaborate these templates for the main 
coral bodies as follows: 


Leather Coral is modeled after corals from genus Sinu- 
laria, as shown in Fig. Ra). It features curvy surfaces that 
folds on it self. We implement Leather Coral using a itera- 
tive mesh generation technique named Differential Growth 
from [67]. The mesh starts off as a mesh circle. At each 
iteration, a force is applied to all of its points. The force at 
each point is composed of an attraction force from its graph 
neighbors, a repulsive force from vertices that are close in 
3D position, a global growth direction, and a noise vector. 
All of these forces can be specified by a set of parameters. 
Such force is applied on all vertices, and the vertices would 
be displaced by a distance proportional to the force. More 
concretely, for all vertices v 


Ax, = xi, =X X fattr + rep + forow i froise 


If the displacement has moved two vertices so that the 
edge connecting these two vertices has its length above a 
certain threshold, such edge would subdivided so that their 
lengths would fall below the threshold, creating new vertices 
on the edges. The aforementioned process defines a growing 


Pinnae Composition Each pinnae consists of a main stem 
and a random number of pinna attached on each side of the 
stem. The length of the pinnae (main stem) is controlled by 
a parameter age. The main stem is first created with a mesh 
line with its points set along the z-axis. Then, the mesh line 
is randomly rotated w.r.t x, y axis around the top point to 
generate the curly look of a fern pinnae. The pinnae’s z-axis 
rotation is slightly different with x, y axis rotation, as we use 
its curvature to represent the age of the fern. In nature, young 
fern pinnae has more curly stem and grown-up fern pinnae 
is usually more stretched and flat. Therefore, we choose the 
scale of the pinnae’s z-axis rotation inverse proportion to the 
age of the pinnae. The external geometry of the main stem 
is a cylindar with its radius gradually shrinks to 0 at the top. 
Noticing that the bottom point is always set to world origin 
despite of the random rotations of the mesh line. 


Before merging multiple pinnae into a fern, each pinnae is 
further curved along the z-axis towards the ground according 
to the desired orientation of the pinnae. We refer this as 
the gravitational rotation induced on the geomtery of pinnae. 
The scale of the gravitational rotation at each mesh line point 
is also proportion to its distance to the world origin, i.e., the 
longer the larger. 


For 2-pinnate fern, the geometry of each pinna is similar 
to pinnae, i.e., a stem and leaves attached on each side. 
Similar to the main stem, we also curve the pinna stem 
randomly w.r.t x, y axis and inverse proportion to the age 
w.r.t the z-axis. In our fern, the leaves are created with simple 
leaves-like geometry. 


The whole pinnae is generated by adding leaf instances 
on pinna stem and then adding pinna instances on the main 
stem. The scale of each leaf instance is scaled to form a 
desired contour shape of the pinna, which is defined to grow 
linearly from tip to bottom with additional random noise. 
For pinna instances on the pinnae, we generate multiple 
distinct pinna versions and then randomly select one for 
each mesh line point. In this way, we can enough irregularity 
and asymmetricity on the pinnae. Furthermore, the pinna 
instances are also scaled according to the desired pinnae 
contour. In our asset, two contour modes are use. One grows 
linearly from top to bottom with additional random noise 
and the other grows linearly from top to 0 from the bottom 
and then decrease linearly till the bottom of the pinnae. 


Moreover, after all components are joined together, ad- 
ditional texture noise is added on the mesh to create more 
irregularities. 


Fern Composition Each fern is a mixture of pinnae with 
random orientations. In nature, these fern pinnae typically 
bend down towards the ground. We also add young fern 
pinnae standing rigidly in the center. 


a = raVA- AB + f(l—A) 
= rpVB - AB - (f +R)B 


where بع رم"‎ f, k are pre-specified parameters. After 1k 
iterations, the field A is stored to the mesh sphere S. Then 
we build another polygon mesh P, which follows simple 
deformation by geometry nodes. We apply shrinkwrap object 
modifier from S to P, which also maps the field A onto mesh 
P. The shrinkwrap object modifiers ‘wraps’ the surface of A 
onto P by finding the projected points of A along A’s normal 
direction. We displace P’s surface using the projected field 
A, which forms the grooves on the mesh surface. For Brain 
Corals, we choose the parameters so that f = V/k/2 — k for 
some specific k, i.e. the kill rate and feed rate of system are 
on the saddle-node bifurcation boundary, so that the surface 
is groovy. 


Honeycomb Coral is modeled after corals from genus 
Favia or Mussismilia, as shown in Fig. Re). It features 
honeycomb-shaped holes on the surface of the coral. We 
use the same Gray-Scott reaction-diffusion model [23] as 
the Brain Corals. We choose the parameters so that f = 
Vk/2 — k — 0.001 for some specific k, i.e. the kill rate 
and the feed rate of the system are between the saddle-node 
bifurcation boundary and the Hopf bifurcation boundary, 
which yields the honeycomb-shaped holes. 


Bush Coral is modeled after corals from genus Acropora, 
which includes the Staghorn coral, as shown in Fig. Rf). We 
implement bush coral using our tree skeleton generator as 
discussed. In particular, the skeleton of tree coral has three 
levels of configuration, with each level of configuration spec- 
ifying how a branch would grow in terms of directions and 
length, as well as where new branches would emerge. After 
the tree skeleton is generated, we convert the 1D skeleton to 
3D mesh by specifying radius at each vertex of the skeleton, 
with the vertices closer to endpoints having a smaller radius. 


Twig Coral is modeled after corals from genus Oculina, as 
shown in Fig. Rg). We use the same tree skeleton generator 
as in Bush Corals. We choose a separate set of parameters 
so that Twig Corals are more low-lying and less directional 
than Bush Corals. 


Tube Coral is modeled after not corals, but sponges from 
genus Aplysina, which none-the-less lives in the same habitat 
as corals, as shown in Fig. Rh). We first generate the base 
mesh as the dual mesh of a deformed icosphere, whose faces 


mesh when it is repeated for multiple iterations. In specific, 
for Leather Corals, we choose the parameters so that the 
noise force function and growth force function are large, 
and the growth iterations stop when there’s 1k faces. To 
convert this mesh to the final coral mesh, we apply smooth 
and subsurface operation on the mesh, followed by a solidify 
operation that gives the planar mesh some width, converting 
it into a 3D mesh. 


Table Coral is modeled after corals from genus Acropora, 
as shown in Fig. Rb). It features a flat table with curvy sur- 
faces near the boundary. For Table Corals, we use the same 
Differential Growth method as the Leather Coral. However, 
we choose a different set of parameters for force application. 
More concretely, we use a larger repulsion force, apply less 
displacement on boundary vertices, and stop the process 
after there is 400 faces in the mesh. 


Cauliflower Coral is modeled after corals from genus 
Pocillopora, as shown in Fig. Rc). It features wart-like 
growth on its surface. We implement Cauliflower Coral 
after [42]’s simulation of dendritic crystal growth. In this 
growth simulation setup, we have two density fields A and 
B over 3D space. The simulation follows the following PDE: 


m = aarctan (y(T — b)/ 7) 


- = VA + A(1 - A)(A-1/2 + m)/r 
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where a, y, T,€, T7, k are pre-specified parameters. We run 
this PDE simulation on a 3D grid space with forward Euler 
method for 800 iterations. The resulting density map A, is 
used to generate the 3D mesh for the coral via the marching 
cube mesh conversion method. 


Brain Coral is modeled after corals from genus Diploria, 
as shown in Fig. Rd). It features groovy surface and intricate 
patterns on the surface of the coral. We first create the 
surface texture with reaction-diffusion system simulation. In 
particular, we start off from a mesh icosphere, and run Gray- 
Scott reaction-diffusion model on its vertex graph, where 
edges in the mesh are the edges of the graph. This simulation 
has two fields A, B on individual vertices, following the 
equation of: 


G.4. Surface Scatters 


Moss forms dense green clumps or mats on a flat surface. 
We create individual moss using Bezier curves as skeletons, 
and then turning them into 3D mesh. We distribute individ- 
ual moss instance on the boulder surfaces with the Z axis 
aligned at an angle with the surface normal. Such angle of 
rotation is guided by a Musgrave texture so that mosses in a 
neighbourhood have similar rotations. Moss instances have 
shaders that is built from a mixture of several yellowish to 
greenish colors, with the color variation determined by a 
Musgrave texture. Moss can grow on three different places 
of the boulder, from the faces with a higher Z coordinate 
than a threshold, or those whose face normal is within a 
threshold of the positive Z axis, or near edges where there 
are concave edges. 


Lichen is a composite organism that arises from algae that 
forms a mat on rock surfaces. Individual lichens are made 
from Differential Growth specified in Appendix G.3.6. The 
color of lichens comes from a mixture of yellowish-greenish 
color and white colors, with the mixing ratio guided by a 
Musgrave texture. Lichen can grow on either boulders or on 
tree trunks. On boulders, lichens are distributed on all faces 
of the boulder, or the lower portions of tree trunks with a 
minimal distance between two instances. 


Slime mold are organisms shaped like gelatinous slime 
that lives on decaying plant materials. We first designate 
around 20 initial seedling vertices where the slime mold can 
grow from, and assign a random weight proportional to the 
local convexity to all edges in the chosen area of the mesh. 
Then we use geometry nodes to compute the shortest path 
from each vertex to any of the seedling vertices, and connect 
these shortest path. These shortest paths from the skeleton of 
the slime mold. Slime mold only grows on the lower portion 
of tree trunks 


Pine needles Pine needles are the leaves of the pine tree 
that have fallen onto the ground, as shown in Fig. V. Pine 
needles are made from a segment of a ellipsoid and are 
typically of brownish and greenish colors. Pine needles are 
scattered on certain parts of the terrain surface based on 
noise texture. Pine needles are scattered on different heights 
so that pine needles of a certain color are above pine needles 
of another color. 


G.5. Marine Invertebrates 


0.5.1  Mollusk 


Mollusk is the collection of animals that includes most snails 
and shells, as shown in Fig. W. 


have between 5-6 vertices. The optionally extrude some its 
upward facing faces along the direction that is approximately 
along the positive Z-axis. This types of extrusion happens 
multiple times with different multiple extrusion length. The 
extruded mesh will have its topmost face removed and will 
be applied with the solidify object modifier so that the mesh 
would become a hollow tube. 


Coral tentacles After the main body of a coral is created, 
we add a high frequency noise onto the coral surface. We 
then add tentacles to the coral mesh. Each tentacle has a 
low-poly mesh generated from the tree generator system that 
sprawls outside the coral body. Tentacles are distributed on 
certain parts of the coral body, like on top-facing surfaces or 
outermost surfaces. 


G.3.7 Other sea plants 


Besides corals, we also provide assets of other sea plants, 
like kelps and seaweeds. Both kelps and seaweeds observe 
the global oceanic current, which is unique for the entire 
scene. We show examples of kelps and seaweeds in Fig. S. 


Kelps Kelps are large brown algae that lives in shallow 
waters, as shown in Fig. Sa). To build kelps, we first build 
meshes for individual kelp leaves. To do so, we first create 
a planar mesh which is bounded by two sinusoid functions 
from two sides. It is then deformed along the Z axis an 
subsurfaced and make a wavy shape. For the kelp stem, we 
first plot a Bezier curve with its control points following a 
Brownian process with drift towards the sum of the posi- 
tive Z direction and the oceanic current drift. The curve is 
turned into a mesh with a small radius, and this becomes 
the mesh of the kelp stem. Along the stem we scatter points 
with fixed intervals, where we place kelp leaves along its 
normal direction. Finally, we rotate the kelp leaves towards 
somewhere lower than the oceanic current vector, so that 
they are affected by both the oceanic current and gravity. 


Seaweeds Seaweeds are another class of marine algae that 
are shorter than kelps, as shown in Fig. Sb). We create sea- 
weed assets using the Differential Growth model as in the 
Leather Coral assets (See Section Appendix G.3.6). In par- 
ticular, we choose the parameters so that it has a large growth 
force towards the positive Z axis. We apply smoothing and 
subsurfacing to the 2D mesh and solidify it into a 3D mesh. 
Then the seaweed is bent towards the direction of the oceanic 
current by a varying degree using the simple deform object 
modifier. Seaweeds are scattered on the surface of the terrain 
mesh. 


similar to ellipsoids with large eccentricities. 


Mollusk material Both snails and shells grows along a 
certain direction and leaves a changing color pattern along 
its growing direction. We define a 2D coordinate (U,V) 
the mollusks surface for mapping textures, whether U is the 
growth direction and V is orthogonal to it. For snails, U is 
the direction of displacement for its cross section mesh, and 
V is along the boundary of the cross section mesh. For shells, 
U is the direction from the center of the shell to the boundary 
of shell, and V is along the boundary of the shell. We design 
a saw-like wavy texture that progresses along either U or V 
directions, which creates interchanging color patterns along 
or orthogonal to the direction of growth. Both snails and 
shells are given a low-frequency surface displacement, and 
scatter on the terrain mesh. 


G.5.2 Other marine invertebrates 


Jellyfish is composed of its cap and tentacles. For the cap, 
we first generate two mesh uv spheres with one above an- 
other, then scale and deform both spheres. Then we subtract 
the sphere below from the the sphere above, creating the cap 
with two surfaces, one facing towards the positive Z axis 
and another facing towards the negative Z axis. Tentacles 
are made from ribbons along the Z axis, which are later 
deformed along the X axis and tapered, and finally rotated 
around the Z axis. Tentacles of varying sizes are placed 
around the lower surface of the cap. The jellyfish shader 
is made from a mixture of colored emission, a principled 
BSDF with transmission, and a colored transparent shader, 
whose mixing ratio is guided by Fresnel coefficients. We 
use a more transparent material for outer surface of the cap 
and shorter tentacles, which are more peripheral parts of the 
body, and a more opaque material for inner surface of the 
cap and longer tentacles, which are core organs of a jellyfish. 
Jellyfish are scattered with a random offset above the ground 
mesh. 


Urchin is a spiny, globular animal living on the sea floor. 
For modeling urchin assets, we first start with an icosphere. 
For each face of the icosphere, we extrude it outwards by 
a small distance, scale it down, and extrude it inwards to 
form the girdle. We then extrude the faces outwards by a 
varying but large distance, and scale it down to zero so that 
they form the spikes that ground on the urchin. The bases 
of an urchin are from a darker color and the spikes are from 
a lighter color between purple and yellow, with the girdle’s 
color somewhere in between. 


Snails is modeled after animals mostly from the class Gas- 
tropoda. It features snails of following shape: Conch (Fig. W 
a), from family Strombidae, Auger (Fig. W b), from family 
Terebridae), Volute (Fig. W c),from family Volutidae) and 
Nautilus (Fig. W d),from a different class Cephalopod). All 
these class of animals share the commonality that they live 
in a shell that grows and rotates at a constant angle as the 
soft body of the animal grows, which can be modeled using 
the array object modifier in Blender. We first build the cross 
section of these snails from the interpolation of a start and a 
ellipsoid, which gives the space for the soft body to live in, 
as well as the pointy spikes on some snails’ surface. Then 
we apply the array object modifier onto the cross section so 
that it is rotated with a constant angle, scaled at a constant 
ratio, and displaced at a constant interval. These series of 
cross section can uniquely define the cross section at each 
stage of the growth, with which we can bridge the edge loops 
to finally form a 3D mesh for the whole snail. Parameters in 
this process includes the displacement of cross section along 
and orthogonal to the axis, the total number cycles of rota- 
tions, the ratio that the scale of cross section shrinks, and the 
other parameters with regard to the shape of the base cross 
section. The parameters are set differently for individual 
class of snails: Conch and Auger have large displacement 
along their axis while Volute and Nautilus have almost none; 
Conch is made from more spiky cross section mesh, and 
has less overlapping chambers than Auger; Nautilus has a 
faster-shrinking cross section, almost no displacement along 
the axis, and have less over lapping chambers than Volute. 


Shells is modeled after animals mostly from the class Bi- 
valvia. It features shells of the following shape: Scallop 
(Fig. W e), from family Pectinidae), Clam (Fig. W f), from 
family Veneridae and Mussel (Fig. W g), from family Myztil- 
idae). These animals share the commonality that they are 
covered by two symmetrical shells joining at a point that 
folds around the soft body of the individual, with both shells 
growing gradually from inside the shell. To build assets 
for these shells, we first generate individual shells. We first 
create a mesh circle and select one point on the circle as 
its origin. For all points on the circle, we scale its distance 
from the origin, with the ratio determined by the direction 
from the origin to the target point. Then we choose a point 
above the XY-plane, and interpolate between the previous 
mesh and the newly selected point. The interpolation ratio is 
determined by a point’s distance to the boundary, so that the 
boundary points are the XY plane, which creates the convex 
shape of an individual shell. We mirror the shell at an angle, 
and now we have the 3D mesh of the shell. We have different 
designs for distinct class of shells: Scallops are given a wavy 
pattern depending on vertices’ direction to the origin, and 
have girdles near the origin; Clams are the most basic shells 
with no alternations; Mussels are made from shells that are 


curves specify the position of each foot as a function of 
time. They can be elongated or scaled in height to achieve a 
gallop or trot, or even rotated to achieve a crab walk or walk- 
ing in reverse. Once the paths are determined, we further 
adjust gait by choosing overall RPM, and offsets for how 
synchronized each foot will be in it’s revolution. 


0.6.3 Genome Templates 


In the main paper, we show the results of our realistic Car- 
nivore, Herbivore, Bird, Beetle and Fish templates. These 
templates contain procedural rules to determine tree structure 
by adding legs, feet, head and appropriate details to create a 
realistic creature. Each template specifies distributions over 
all attachment parameters specified above, which provides 
additional diversity ontop of that of the parts themselves. 
Tree topology is mostly fixed for each template, although 
some elements like the quantity of insect legs and presence 
of horns or fish fins are random. 

Our creature system is modular by design, and supports 
infinite combinations besides the realistic ones provided 
above. For example, we can randomly combine various crea- 
ture bodies, heads and locomotion types to form a diverse 
array of creatures shown in Fig X. As we continue to im- 
plement more independent creature parts and templates, the 
possible combinations of these random creature genomes 
will exponentially increase. 

Creature genomes also support a semi-continuous inter- 
polation operation. Interpolation is trivial for creatures with 
identical tree structure and part types - one can perform 
part-wise linear interpolation of node and edge parameters. 
When tree topology or part types don’t match, we recursively 
compute a matching for each node’s children which mini- 
mizes the difference of edge attachment parameters, then 
perform linear interpolation on any node parameters with 
matching names. To interpolate between a present and miss- 
ing genome node, we scale the part down from its original 
size to 0, which results in small vestigial arms or tails on the 
intermediate creatures. When part types do not align exactly, 
there is a discrete transition halfway through interpolation, 
so interpolation is not continuous in all cases. 


G.6.4 Creature Parts 


NURBS Parameterization Many of our creature body 
and head templates are comprised of non-uniform rational 
B-splines (NURBS). NURBS are the generalized 3D analog 
of a Bézier curve. In order to form a closed shape, we 
pinch each NURBS surface closed at its ends, and loop it’s 
handles in the V direction to form a closed cylinder as a 
starting point. We set the U and V knot-vectors to be Pinned 
Uniform, and instead rely on densely-spaced or coincident 
handles to create sharp edges where necessary. 


G.6. Creatures 


G.6.1 Creature Construction 


Each creature genome is a tree of parameters, with nodes 
specifying parts and edges specifying attachment. 

Each node contains a dictionary of named input parame- 
ters for one of our part templates (Sec. G.6.4). We compute 
all parts in isolation before proceeding to attach them. This 
part template must produce 1) a mesh and 2) a skeleton line. 
The skeleton line is a 3D parametric curve specifying the 
center line of the part, and is used for attachment and rig- 
ging. Requiring part templates to produce a center line is 
not a limitation - for NURBS and most node-graph parts it 
is trivial to obtain. Additionally, this output can be omitted 
for any part not intended to have further children attached to 
it. Each part template may also produce additional metadata 
for use in the animation and material stages. 

Each edge contains a coordinate (u, vu, r) to determine 
the attachment location. (u, v) € [0, 1]? specifies a location 
on the parent mesh’s surface. For arbitrary meshes, this 
is computed by travelling u percent of the way along the 
parent’s skeleton and raycasting orthogonally to it, with 
angle 360° + v. If the parent part is a NURBS, one can 
instead query (u, v) on it’s parametric surface. Finally, we 
use r to interpolate between the found surface point, and 
the corresponding skeleton point, which has the effect of 
controlling how close to the surface the part is mounted. 

Finally, each edge specifies a relative rotation used to pose 
the part. Optionally, this rotation can be specified relative to 
the parent part’s skeleton tangent, or the attachment surface 
normal. 


G.6.2 Creature Animation 


As an optional extra output, each part template may specify 
where and how it articulates, by specifying some number of 
joints. Each joint provides specifies rotation constraints as 
min/max euler angles, and a parameter t € [0, 1], specifying 
how far along the skeleton curve it lies. If a part template 
specifies no joints, it’s skeleton is assumed to be rigid, with 
only a single joint at t = 0. We then create animation bones 
spanning between all the joints, and insert additional bones 
to span between parts. 

Any joint may also be tagged as a named inverse kinemat- 
ics (IK) target, which are automatically instantiated. These 
targets provide intuitive control of creature pose - a user or 
program can translate / rotate them to specify the pose of 
any tagged bones (typically the head, feet, shoulders, hips 
and tail), and inverse kinematics will solve for all remaining 
rotations to produce a good pose. 

We provide simple walk, run and swim animations for 
each creature. We procedurally generate these by construct- 
ing parametric curves for each IK target. These looped 


The model also has parameters that control how much the 
tip of the beak hooks and how much the middle and bottom 
of the beak bulge, to cover different types of beaks. The 
lower part is modeled by the same model of the upper part 
with reversed Z coordinates. Along with the model, we 
provide four parameter sets to create eagles, herons, ducks 
and sparrows’ beaks templates. 


Node-Graph Creature Parts All part templates besides 
those mentioned above are implemented as node-graphs. 
We provide an extensive library of node-groups to ease the 
construction of creature parts. The majority of these involve 
placement and querying of parameterized tubes, which we 
use to build muscular legs, arms and head parts. For example, 
our Tiger Head and Quadruped Leg templates contain nodes 
to construct the main central form of each part, followed 
by placement of several tubes along their length to create 
muscles and detailed forms. This results in a randomizable 
representation of face and arm musculature, which produces 
the detailed carnivore heads and legs shown in the main 
paper. These node-graph tools can also be layered ontop of 
NURBS as a base representation. 


By default, a NURBS cylinder is represented as an N x M 
array of 3D handle locations. We find the space of all 
NURBS handle configurations too high dimensional and 
unstructured to randomize directly. Adding Gaussian noise 
to handle locations produces lumpy, unrealistic creatures, 
and is unlikely to coordinate to create phenomena like bent 
limbs or widened midsections. Instead, we randomize under 
a factored representation. Specifically, we start with a Nx3 
array of radii and relative angles, which stores a center line 
for the part as polar-coordinate offsets. Accumulating these 
produces an N z3 skeleton line. We arrange N profile shapes 
around this center line, each stored as M 3D points centered 
about the origin. This representation has just as many pa- 
rameters as the original, but randomizing it produces better 
results. Adding noise to the polar skeleton angles and radii 
produces macro-scale changes in body or head shape, and 
multiplying the profiles by random scalars can easily change 
the radius or cross section, independent of where along the 
skeleton that profile is located. 

As a starting point for this randomization, we determined 
skeleton and profile values which visually match a large 
number of reference animal photos, including Ducks, Gulls, 
Robins, Cheetahs, Housecats, Tigers, Wolves, Bluefish, Crap- 
pie Fish, Eels, Pickernel Fish, Pufferfish, Spadefish, Cows, 
Giraffe, Llama and Goats. Rather than make a discrete 
choice of which mean values to use, we take a random con- 
vex combination of all values of a certain category, before 
applying the extensive randomization described above. 


Horns are modeled by a transpiled Blender node graph. 
The 2D shapes of horns are based on spiral geometry node, 
supporting adjustable rotation, start radius, end radius and 
height. The 3D meshes then are constructed from 2D shapes 
by curve-to-mesh node, along with density-adjustable depth- 
adjustable ridges. Along with the model, we provide three 
parameter sets to create goats, gazelles and bulls’ horn tem- 
plates. 


Hooves are modeled by NURBS. We start with a cylinder, 
distributing control points on the side surface evenly. For 
the upper part of the cylinder, we scale it down and make it 
tilted to the negative X axis, which makes its shape closed 
to the horseshoe. The model also has an option to move 
some control points toward the origin, in order to create 
cloven hooves for goats and bison. Along with the model, 
we provide two parameter sets to create horses’ and goats’ 
horns templates. 


Beaks are modeled by NURBS. Bird beaks are composed 
of two jaws, generally known as the upper mandible and 
lower mandible. The upper part starts with a half cone. We 
use the exponential curve instead of the linear curve of the 
side surface of the cone to obtain the natural beak shape. 


Figure E. 144 randomly generated, non-cherry-picked images of terrain produced by our system (Part 1 of 2). Images are compressed due to 
space constraints - please see 


Figure F. 144 randomly generated, non-cherry-picked images of terrain produced by our system (Part 2 of 2). Images are compressed due to 
space constraints - please see 
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Figure G. Qualitative results on natural stereo photographs. Rectified input images are captured at 2208 x 2484 resolution using a calibrated 
ZED 2 stereo camera [79]. Our data generator helps RAFT-Stereo generalize well to real images of natural objects. 
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Figure H. Qualitative results on the Plant and Australia Middlebury [74] test images. RAFT-Stereo trained using Infinigen generalizes well 
to images with natural objects. 
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Figure I. High-Resolution Ground Truth Samples. We show select ground truth maps for 8 example Infinigen images. For space reasons, we 
show only Depth, Surface Normals / Occlusion and Instance Segmentation. Our instance segmentation is highly granular, but classes can be 
grouped arbitrarily using object metadata. See Sec.C.2 for a full explanation. 


(b) Depth from Blender’s built-in render passes. 


Figure K. Our ground truth is computed directly from the underly- 
ing geometry and is always exact. Prior methods [9, 24, 27, 29,48] 
generate ground-truth from Blender’s render-passes, which leads to 
noisy depth for volumetric effects such as water, fog, smoke, and 
semi-transparent objects. 


Face Area (cm?) 


(a) The area of mesh faces in cm?. Our dynamic-resolution scaling causes 
faces closer to the camera to be smaller. 


Distance (meters) 


(b) Distance of faces from the camera (i.e. depth). Distance is propor- 
tional to the area of faces. 
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(c) Face area measured in pixels. Our dynamic resolution scaling causes 
individual mesh faces to appear approximately one pixel across. 


Figure J. Dynamic Resolution Scaling. Faces further from the 
camera are made smaller (a) such that they appear to be the same 
size from the camera’s perspective (c). We show the depth map for 
reference (b). 
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Figure L. Resource requirements for creating a pair of stereo 1080p images using Infinigen. Our mesh resolutions scale with the output 
image resolution, such that individual mesh faces are barely visible. As a result, these statistics will change for different image resolutions. 
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Figure M. An example node-graph, as shown as input to the tran- 
spiler in Fig. 3 of the main paper. Dark green nodes are node 
groups, containing user-defined node-graphs as their implementa- 
tions. Red nodes show tuned constants, with annotations for their 
distribution. 


Figure O. We control the shape of bushes by by specifying the 
distributions of the attraction points. Each row are the same bush 
species with different shapes (left to right: ball, cone, cube). 


Figure P. All classes of cacti included in Infinigen. Each row contains one class of cactus: a) Globular Cactus; b) Columnar Cactus; ©( 
Pricky pear Cactus. 
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Figure Q. Assets of fern pinnae included in Infinigen. A fern consists of a random number of pinnae in the same color with random 
orientations. 


Figure R. All classes of corals included in Infinigen. Each block contains one class of coral: a) Leather Coral; b) Table Coral; ك‎ Cauliflower 
Coral; d) Brain Coral; e) Honeycomb Coral; f) Bush Coral; g) Twig Coral; h) Tube Coral. 


Figure S. Kelps a) and seaweeds b) examples, each occupying one row. 


Figure T. Boulder assets with different rock cover surfaces. In particular, each row of boulders are under a) no surface; b) moss surface; ) 
lichen surface. 
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Figure U. Mushrooms a) and pinecones b). 


Figure V. Pine needles scattered onto the ground with varying density. 


Scallop, f) Clam, g) Mussel. 


, d) Nautilus, e) 


of mollusks, with each block representing a class of mollusk: a) Conch, b) Auger, © Volute. 


Figure W. Different classes 
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Figure X. We provide procedural rules to combine all available creature parts, resulting in diverse fantastical combinations. Here we show a 
random, non-cherry-picked sample of 80 creatures. Despite diverse limb and body plans, all creatures are functionally plausible and possess 
realistic fur and materials. 


Named Parameters 


Noise Scale, Cracks Scale 

Color Brightness, Wave Scale, Wave Distortion, Noise Scale, Noise Detail 

Stone Scale, Uniformity, Depth, Crack Width, Stone Colors (5), Mapping Positions (2), Roughness 

Low Freq. Bump Size, Low Freq Bump Height, Crack Density, Crack Scale, Crack Width, 001011, Color2, 
Noise Detail, Noise Dimension 

Chunk Scale, Chunk Detail, Colorl, Color2 

Color1 

Speckle Scale, Color1, Color2, Speckle Color1, Speckle Color2, Speckle Color 3 

Color, Roughness, Distortion, Detail, Uneven Percent, Transmission, IOR 

Wetness, Large Bump Scale, Small Bump Scale, Puddle Depth, Percent water, Puddle Noise Distortion, Puddle 
Noise Detail, Color1, Color2, Color3, WaterColor1, WaterColor2 


Ridge Polynomial (2), Ridge Density, Ridge High Freq., Ridge Noise Mag., Ridge Noise Scale, Ridge 
Disp:Offset Magnitude, Roughness, Crack Magnitude (2), Crack Scale, Color1, Color2, Dark Patch Percentages 
(3), Micro Bump Scale, Micro Bump Magnitude 

Average Roughness, Grain Scale, Subsurface Scattering 

Pebble Sizes (2), Pebble Noise Magnitudes, Pebble Roundness, Pebble Amounts, Voronoi Scale, Voronoi 
Mag., Base Colors (2), Darkness Ratio 

Rock Scale, Rock Deepness (2), Noise Detail, Noise Roughness, Crack Scale, Crack Width, Color1, Color2, 
Roughness 

Bump Offset, XY Ratio 


Blackbody Intensity, Smoke Density 

Color, Density 

Wave Scale, Choppiness, Foam, Main Color, Cloudiness 

Color, Rock Roughness, Amount of Rock, Lava Emission, Min Lava Temp., Max Lava Temp., Voronoi Noise, 
Turbulence, Wave Scale, Perlin Noise 

Color, Scale, Detail, Lacunarity, Height 

Ripple Scale, Detail, Ripple Height, Noise Dimension, Lacunarity, Color 

Color, Foam Color, Foam Density 


Displacement Scale, Z Noise Scale, Z Noise Amount, Z Multiplier, Primary Voronoi Scale, Primary Voronoi 
Randomness, Secondary Voronoi Mix Weight, Secondary Voronoi Scale, Color 

Noise Scale (2), Noise Detail (2), Displacement Scale, 

Color Noise (3), Roughness Noise (3), Roughness Min/Max (2), Translucence Noise (3), Translucence 
Min/Max (2) 

Scale, XY Ratio, Offset 

Wave Scale, Wave Distortion, Musgrave Scale, Musgrave Distortion, Roughness Min/Max (2), Translucence 
Base Color, Vein Color 

Diffuse Color, Translucent Color, Translucence, Center Colors (2), Center Color Coeff. 

Bright Color, Dark Color, Light Color, Fresnel Color, Musgrave Scale 

Edge Weight, Spline Parameter Cutoff, Seedlings Count, Min Distance, Bright Color, Dark Color, Musgrave 
Scale 

Bright Color, Dark Color, Musgrave Scale, Density, Min Distance, Instance Scale 


Bird Type, Head Ratio, Stripe Width, Stripe Noise, Neck Ratio, Color1, Color2. 
Bump Scale, Bump Frequency, Bump Offset. 

Boundary Width, Noise Weight, Thorax Size. 

Noise Scales (2), Noise Details (2), Mapping Control Points (4) 

Circle Scale, Circle Boundary, Noise 

Scale Size, Scale Noise, Fish Type, Scale Offset, Color1 Ratio, Color2 Ratio, Noise 
Offset Z, Offset Y, Shape, Bump Noise, Fin Type, Bump Weight, Transparency 
Scale, Noise, Circle Width, Belly 

Scale, Offset, Noise 

Noisel, Noise2 

Color1, Color2 

Scale Size, Scale Noise, Scale Rotation 

Scale, Offset 

Spot Scale, 00101, 2 

Spotl Ratio, Spot2 Ratio 

Belly, Stripe Distortion, Stripe Frequency, Stripe Shape 

Offset, Spot Scale, Ratio, Noise 

UV Pattern Ratio, Scale, Distortion, Pattern Type, Hue Range, Saturation Range, Value Range, Colors Per 
Pattern 


Interpretable 
DOF 


2 
5 
13 
9 
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Total: 1 


Material Generators 


Mountain 
Sand 
Cobblestone 
Dirt 


Chunky Rock 
Glowing 
Granite 

Ice 

Mud 


Rock 
Sandstone 


Snow 
Soil 


Stone 


Aluminium 


Fire 
Smoke 
Ocean 
Lava 


Surface water 
Water 
Waterfall 


Bark 


Bark Birch 
Greenery 


Wood 

Grass 

Leaf 

Flower 

Coral Shader 
Slime Mold 


Lichen, Moss 


Bird 

Bone 

Chitin 

Horn 

Reptile Brown 
Fish Body 
Fish Fin 
Giraffe 
Reptile 
Reptile Gray 
Reptile 2-Color 
Scale 

Slimy 

Spot Sparse 
3-Color Spots 
Tiger 

2-Color Spots 
Mollusk 


Num. Generators: 50 


Table C. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our 


Material Generators. 


Named Parameters 


Cavern Size, Tunnel Frequency, Fork Frequency 
Rock Frequency, Warping Frequency 

Dune Frequency, Warping Frequency 

Mountain Frequency, Num. Scales 

Coast curve frequency, Height mapping function 


Tile Types, Tile Heights, Tile Frequency, Element Probabilities, Water Level, Snow 


Interpretable 
DOF 


2 نن دم دم دم نم هه 2 25 5 1 


DOF: 17 


Terrain Generators 


3D Noise, Wind-Eroded Rocks 
Caves 

Voronoi Rocks, Grains 

Sand Dunes 

Mountains, Floating Islands 
Coast line 

Ground Slope 

Still Water, Ocean 
Atmosphere 

Tiled Landscape 


Scene Types (Arctic, Canyon, 
Cave, Cliff, Waterfall, Coast, 
Desert, Mountain, Plain, River, 
Underwater, Volcano) 


Num. Generators: 26 


Table D. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our Terrain 
Generators. Terrain is heavily simulation and noise-based, so has few interpretable DOF but uncountable internal complexity. 


Named Parameters 
Density, Mass, Lifetime, Size, Damping, Drag, 


Density, Anisotropy, Noise Scale, Noise Detail, Voronoi Scale, Mix Factor, Increased Emission, Angular 
Density, Mapping Curve (6) 

Density Min, Density Max, Color, Noise Scale, Anisotropy 

Viscocity, Viscocity Exponent, Surface Tension, Velocity Coord, Spray Particle Scale, Flip Ratio 

Max Temp, Gas Heat, Bouyancy, Burn Rate, Flame Vorticity, Smoke Vorticity, Dissolve Speed, Noise Scale, 
Noise Strength, Surface Emission, Turbulence Scale, Turbulence Strength 

Overall Intensity, Sun Size, Sun Intensity, Sun Elevation, Altitude, Air Density, Dust Density, Ozone Density 
Scale, Sharpness, Coordinate Warping, Power, Spotlight Blending 

Wattage, Colors, Shape Distortion 

Wattage, Light Size, Blending 


Interpretable 
DOF 


6 


13 
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DOF: 61 


Lighting, Weather 
& Fluid Generators 


Dust, Rain, Snow, Windy 
Leaves 
Cumulus, Cumulonimbus, Stra- 
tocumulus, Altocumulus 
Atmospheric Fog, Dust 
Lava/Water 


Fire/Smoke 


Sky Light 

Caustics 

Glowing Rocks 

Camera Lighting (Flashlight, 
Area Light) 


Num. Generators: 19 


Table E. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our 
Lighting, Weather and Fluid Generators 


Named Parameters 


Aspect Ratio, Deform, Roughness 

Num. Extrusions, Length, Z Offset Variance 

Initial Vertices Count, Is Slab, Large Extrusion Probability, Small Extrusion Probability, Large Extrusion 
Distance, Small Extrusion Distance 


Interpretable 
DOF 


3 
3 
6 


DOF: 12 


Rock Generators 
Rocks 


Stalagmite / Stalactite 
Boulder 


Num. Generators: 4 


Table F. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our Rock 


Generators. 


Named Parameters 


Center Radius, Petal Dimensions (2), Seed Size, Petal Angle Range (2), Wrinkle, Curl 

Stem Curve Control Points, Stem Rot. Angle, Polar Mult. X, X Wave Control Points, Y Wave 
Control Points, Warp End Rad., Warp Angle 

Midpoint (2), Length, X Angle Mean 

Midrib Length, Midrib Width, Stem Length, Vein Asymmetry, Vein Angle, Vein Density, 
Subvein Scale, Jigsaw Scale, Jigsaw Depth, Midrib Control Points, Shape Control Points, Vein 
Control Points, Wave X, Wave Y 

Stem Control Points, Shape Curve Control Points, Vein Length, Blade Angle, Polar Multiplier, 
Vein Scale, Wave, Scale, Margin Scale, X Wave Control Points, Y Wave Control Points 

Bud Float Curve Angles, Bud Float Curve Scales, Bud Float Curve Z Displacements, Bud 
Instance Rotation Perturbation, Bud Instance Probability, Bud Instance Count, Profile Curve 
Radius, Max Bud Rotation, Rotation Frequency, Stem Height, Bright Color, Dark Color , 
Musgrave Scale 

Subdivision, Z Scale, Bevel Percentage, Spike Probability, Girdle Height, Extrude Height, 
Spike Scale, Base Color, Girdle Color, Spike Color, Transmission, Subsurface Ratio 

Ocean Current, Deform Angle, Translation Scale, Expansion Scale, Bright Color, Dark Color, 
Musgrave Scale 

Cap Height, Cap Scale, Cap Perturbation Scale, Long Opaque Tentacles Count, Short Trans- 
parent Tentacles Count, Arm Screw Angle, Arm Screw Offset, Arm Taper Factor, Arm Dis- 
placement Strength, Arm Min Distance, Arm Placement Angle Threshold, Bright Color, Dark 
Color, Transparent Color, Fresnel Color, Musgrave Scale 

Ocean Current, Axis Shift, Axis Length, Axis Noise Stddev, Leaf Scale, Leaf Float Curve 
Length, Leaf Float Curve Width, Leaf Float Curve Z Displacement, Leaf Rotation Perturb, 
Leaf Tilt, Leaf Instance Probability, Leaf Instance Rotation Stride, Leaf Instance Rotation 
Interpolation Factor, Leaf Instance Count 

Top Control Point, Shell Interpolation Ratio, Shell Float Curve Angles, Shell Float Curve 
Scales, Radial Groove Scale, Radial Groove Frequency, Hinge Length, Hinge Width, Angle 
Between Shells 

Cross Section Affine Ratio, Cross Section Spiky Perturbation, Cross Section Concavity, Lateral 
Movement, Longitudinal Movement, Rotation Frequency, Scaling Ratio, Loop Count 
Intialization Bump Count, Initialization Bump Stride, Timesteps, Step size, Diffusion Rate A, 
Diffusion Rate B, Feed Rate, Kill Rate, Perturbation Scale, Smooth Scale 

Face Perturbation, Short Extrude Length Range, Long Extrude Length Range, Extrusion 
Direction Perturbation, Drag Direction, Extrusion Probability 

Timesteps, Kill Rate, Step size, Tau, Eps, Alpha, Gamma, Equilibrium Temperature 

Branch Count, Secondary Branch Count, Tertiary Branch Count, Horizontal Span, Length, 
Secondary Length, Tertiary Length, Base Radius, Radius Decay Ratio 

Colony Count, Max Polygons, Noise Factor, Step Size, Growth Scale, Drag Vector, Replusion 
Radius, Inhibit Shell Factor 

Min Distance, Z Angle Threshold, Radius Threshold, Density, Branch Count, Branch Length, 
Color 

Num. Blades, Length Std., Curl Mean, Curl Std., Curl Power, Blade Width Variance, Taper 
Mean, Taper Variance, Base Spread, Base Angle Variance 

Pinna Rotation (2), Pinnae Rotation (2), Pinnae Gravity, Age, Age Variety, Num Pinna, Pinnae 
Contour, Num Pinnae Varieties, Num Leaves, Leaf Width Randomness, Num Pinnae, Pinnae 
Rotation Randomness (2), 

Cross Section Float Curve Angles, Cross Section Float Curve Scales, Cross Section Center 
Offset, Cross Section Z Rotation, Stem Length, Stem Radius, Cap Groove Ratio, Cap Scale 
Ratio, Cap Radius Float Curve Height, Cap Radius Float Curve Radius, Has Web, Umbrella 
Radius, Umbrella Height, Bright Color, Dark Color , Light Color, Musgrave Scale 

Branch Leaf Rotation, Branch Leaf Instance, Branch Stem Radius, Branch Rotation Coeff, 
Branch Leaf Density, Stem Branch Density, Stem Branch Scale, Stem Branch Range, Stem 
Leaf Instance, Stem Leaf Rotation, Stem Flower Instance, Stem Flower Scale, Stem Rotation 
Coeff. STem Radius, Num Versions, Rotation Z 

Branches Count, Secondary Branches Count, Min Distance, Top Cap Percentage, Density, 
Color 

Groove Scale, Groove Count, Rotation Frequency, Profile Curve Height, Profile Curve Radius 
Radius Decay Branch, Radius Decay Root, Radius Smoothness Leaf, Branch Count, Nodes 
Per Branch, Nodes Per Second Level Branch, Base Radius, Perturbation Scale, Groove Scale 
Leaf Profile Curve Width, Leaf Profile Curve Height, Leaf Instance Scale Ratio, Leaf Instance 
Placement Angles, Leaf Instance Count 


Interpretable 
DOF 


8 
7 


DOF: 258 


Plant & Underwater 
Generators 


Flower 
Maple 


Pine 


Broadleaf 


Ginko 


Pinecone 


Urchin 
Seaweed 


Jellyfish 


Kelp 


Shells (Scallop, Clam, Mussel) 


Snail (Volute, Nautlius, Conch) 
Reaction Diffusion Coral 
Tube Coral 


Laplacian Coral 
Tree Coral 


Diff. Growth Coral 
Coral Tentacles 
Grass Tuft 


Fern 


Mushroom 


Flower Stem 


Cactus Spikes 


Globular Cactus 
Columnar Cactus 


Pricky Pear Cactus 


Num. Generators: 30 


Table G. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our Plant 
and Underwater Invertebrate Generators. 


Named Parameters 


Start Rad., End Rad., Ribcage Proportions (3), Flank Proportions (3), Spine Coeffs (3), Aspect Ratio 

Start Rad., End Rad. Aspect Ratio, Fullness 

Start Rad., End Rad., Snout Proportions (3), Aspect Ratio, Lip Muscle Coeff. (3) Jaw Muscle Coeff. (3), 
Forehead Muscle Coeff (3) 

Start Rad. End Rad., Neck Muscle Coeffs. (3) 

Start/End Dims (4), Proportions, Angle Offsets, Profile Offset, Bump Offset, 


Rad1, Rad2, Width Shaping, Canine Size, Incisor Size, Tooth Density, Tooth Crookedness, Tongue Shaping, 
Tongue X Scale 

Start Rad., End Rad., Aspect Ratio, Thigh Coeffs (6), Calf Coeffs (6), 

Start Rad., End Rad., Aspect Ratio, Shoulder Coeffs. (6), Forearm Coeffs (6), Elbow Coeffs (6) 

Start Rad., End Rad., Aspect Ratio, Thigh Coeffs (3), Shin Coeffs (3) 

Start Rad., End Rad., Carapace Rad. Spike Length, Spike Start Rad., Spike End Rad., Spike Range (2), Spike 
Density 

Width, Roundness, Ridge Frequency, Offset Weight (2), Ridge Rot., Affine (2), Noise Ratio (2) 

Feather Dims (3), Max Rotation (3), Rotation Randomness (3) 

Start Rad., End Rad., Feather Density, Feather Form Sculpting, Wing Extendedness, Feather Rot. Randomness 
(2) 

Curve Y, Curve Z, Hook Coeff. (2), Hook scale (2), Hook Pos. (2), Hook Thickness (2), Crown Scale, Crown 
Coeff. (2), Bump Scale, Bump L, Bump R, Sharpness 

Radius, Eyelid Thickness, Eyelid Fullness, Tear Duct Placement (3), Eye Corner Placement (3) 

Start Rad., End Rad., Depth, Thickness, Curl Angle 

Start Rad, End Rad, Curl, Aspect Ratio 

Radius, Nostril Size, Smoothness 

Claw Y Scale, Claw Z Scale, Claw Sag, Angle Length, Angle Rad. Start, Ankle Rad. End, Upper Shape, 
Lower Shape 

Rad. Start, Rad. End, Ridge Thickness, Ridge Density, Ridge Depth, Height 

Start Rad., End Rad., Toe Density, Toe Dimensions (3), Toe Splay, Footpad Radius, Claw Curl, Claw 
Dimensions (3) 

Start Rad., End Rad., Curl, Aspect Ratio 

Max Bending Stiffness, Max Compression Stiffness, Goal Spring Force, Pin Stiffness, Shear Stiffness (2), 
Tension Stiffness, Pressure 

Steps Per Second, Stride Length, Gait spread, Stride Height, Upturn, Downstroke 

Clump Num., Avoid Eyes Dist., Avg. Length, Avg. Puff, Length Noise (2), Puff Noise (2), Combing, Strand 
Noise (3), Tuft Spread, Tuft Clumping, Hair Radius, Intra-clump Noise, Length Falloff, Roughness 

Head Ratio, Head Attachment, Jaw Ratio, Jaw Attachment, Eye Attachment (3), Nose Attachment, Ear 
Attachment (3), Shoulder Dist., Shoulder Splay, Leg Ratio 

Neck Start T., Hoof Angle, Foot Angle, Head Interp Temp., Jaw Ratio, Jaw Attachment, Eye Attachment 
(3), Nose Attachment, Ear Attachment (3), Shoulder Dist., Shoulder Splay, Leg Ratio, Include Nose, Include 
Horns, Horn Attachment (3), Body Interp Temp 

Head Ratio, Head Attachment, Tail Attachment (2), Leg Length Ratio, Foot Size Ratio, Leg Attachment (3), 
Wing Length Ratio, Wing Attachment (3), Head Ratio, Eye Attacment (3) 

Leg Density, Leg Splay, Leg Length Ratio, Include Mandibles, Mandible Attachment (3), Has Hair 

Dorsal Fin Ratio, Pelvic Fin Ratio, Pectoral Fin Ratio, Hind Fin Ratio, Fin Attachment (3), Eye Attachment (3) 
Body Interp Temp. 

Has Wings, Locomotion Type, Hair Type, Interp Temperature, Head Type, Has Eyes, Nose Type, Has Jaw, Has 
Ears, Has Horns 


Interpretable 
DOF 


12 
3 
15 


5 
8 


22 


17 


DOF: 315 


Creature Generators 


Geonodes Quadruped Body 
Geonodes Bird Body 
Geonodes Carnivore Head 


Geonodes Neck 

NURBS Bodies/Heads (Carni- 
vore, Herbivore, Fish, Beetle 
and Bird) 

Jaw 


QuadrupedBackLeg 
QuadrupedFrontLeg 
Bird Leg 

Insect Leg 


Ridged Fin 
Feather Tail 
Feather Wing 


Beak 


MammalEye 
Ear 

Insect Mandible 
Nose 

Hoof 


Horn 
Foot 


Tail 

Cotton, Skin, Rubber Simula- 
tion 

Running Animation 

Short Hair, Fluffy Hair, Feath- 
ers 

Carnivore Genome 


Herbivore Genome 


Bird Genome 


Insect Genome 
Fish Genome 


Random Genome 


Num. Generators: 39 


Table H. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our 


Creature Generators. 


Named Parameters 


Growth Height, Trunk Warp, Num. Trunks, Branching Start, Branching Angle, Branching Density, Branch 
Length, Branch Warp, Pull Dir. Vertical, Pull Dir. Horizontal, Outgrowth, Branch Thickness, Twig Density, 
Twig Scale, Twig Pts, Twig Branching Start, Twig Rot. Randomness, Twig Branching Density, Twig Init 
Z,Twig Z Randomness, Twig Subtwig Size, Twig Subtwig Momentum, Twig Subtwig Std., Twig Size Decay, 
Twig Pull Factor, Space Colonization Shape 


Interpretable 
DOF 


26 


DOF: 26 


Tree Generators 


Random Tree, Pine Tree, Bush 


Num. Generators: 3 


Table I. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our Tree 


Named Parameters 


Asset/Scatter inclusion probabilties (39), Num. Creature/Plant Subspecies (4), Noise Mask Scales (30), Mask 
Tapering Coeff. (12), Normal Mask Thresholds (14), Placement Densities (5), Placement Habitats (6) 


Interpretable 
DOF 


110 


DOF: 110 


Generators. 


Scene Composition Generators 


Scene Generators (Arctic, 
Canyon, Cave, Cliff, Coast, 
Desert, Forest, Mountain, Plain, 
River, Under Water) 


Num. Generators: 11 


Table J. Our full system contains 182 procedural asset generators and 1070 interpretable DOF. Here we show parameters for just our Scene 
Composition Configs. Each config references a terrain composition generator from Fig. D, and specifies a realistic distribution of other 
assets to create a fully realistic natural environment. 


